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1.  INTRODUCTION 


\ 

! 

1.1  The  main  purpose  of  this  appendix  Is  to  provide  a mathematical 
description  of  the  Reliability-Centered  Maintenance  Programed eveloped 
by  United  Air  Lines  and  described  in  [6]  and  [7].  ^Although  a mathe- 
matical formulation  may  not  make  it  any  easier  to  implement  this  program, 
by  placing  it  in  a broader  context  we  hope  to  emphasize  the  generality 
of  its  underlying  principles  and  encourage  their  application  to  complex 
systems  other  than  commercial  air  fleet  maintenance  operations. 

Another  purpose  of  this  appendix  is  to  provide  a brief  but  coherent 
introduction  to  those  aspects  of  the  theory  of  probability  necessary  for 
an  understanding  of  the  theoretical  basis  for  the  Reliability-Centered 
Maintenance  Program.  This  account  differs  appreciably  from  the  presen- 
tations usually  found  in  textbooks  on  reliability  theory:  standard 
treatises  concentrate  on  the  functions  associated  with  reliability  and 
on  their  analytical  manipulation.  Here  we  focus  on  the  underlying  sets 
of  items  and  events  and  on  their  mutual  relationships.  There  are  two 
principal  reasons  for  this  difference  of  approach,  a difference  which 
is  in  large  measure  fundamental  to  the  philosophy  underlying  Reliability- 
Centered  Maintenance.  ^ . 

The  first  reason  is  that  collections  of  operational  commercial  and 
military  gas-turblne-engined  aircraft  are  among  the  most  complex  systems 
evolved  by  civilization.  A single  aircraft  consists  of  tens  of  thousands 
of  interrelated  parts  whose  integrated  and  harmonious  operation  is  neces- 
sary for  successful  completion  of  the  aircraft's  mission.  These  consti- 
tuent parts,  assemblies,  and  subsystems  exhibit  every  extreme  and  inter- 
mediate aspect  of  reliability  behavior.  For  this  reason  alone — complexity 
due  to  diversity — there  can  be  no  hope  for  a complete  analytical  descrip- 
tion of  reliability  properties  which  could  form  the  basis  for  development 
of  an  optimal  maintenance  policy.  Aircraft,  and  aircraft  systems,  consist 
of  sets  of  constituent  parts— sets  having  a large  number  of  elements. 
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sets  whose  elements  are  related  In  complicated  ways.  Consequently,  our 
attention  must  be  primarily  (although  not  exclusively)  directed  to  con- 
sideration of  aircraft  and  aircraft  systems  as  sets. 

The  second  reason  is  more  subtle.  It  has  been  said  that  the 
principal  problem  facing  the  designer  of  a maintenance  policy  for  air- 
craft operations  is  one  of  information.  It  would  be  more  accurate  to 
assert  that  the  problem  is  one  of  lack  of  information.  One  of  the  most 
important  contributions  of  the  Reliability-Centered  Maintenance  Program 
is  its  explicit  recognition  that  certain  types  of  information  heretofore 
actively  sought  as  a product  of  maintenance  activities  are,  in  principle 
,as  well  as  in  practice,  unobtainable.  The  twentieth  century  has  iden- 
tified uncertainty  as  a fundamental  principle  on  whose  shifting  sands 
profound  and  powerful  theories  have  been  erected:  GiJdel’s  Incomplete- 
ness Theorem  in  mathematical  logic  and  Heisenberg's  Uncertainty  Principle 
in  quantum  physics  stimulated  rather  than  stifled  progress,  the  spawn  of 
the  latter  including. microelectronics  as  well  as  nuclear  science.  The 
Reliability-Centered  Maintenance  Program  extends  these  philosophical 
views  to  reliability  engineering  by  elevating  the  unobtainability  of 
information  to  a positive  principle-  This  is  a consequence  of  the  fol- 
lowing observation:  the  only. information-bearing  events  which  are  of 
ultimate  significance  to  the  aircraft  maintenance  policy  designer  are 
failures,  and  among  these  the  critical  failures  bear  the  greatest  amount 
of  information.  Thus,  the  task  of  the  maintenance  policy  designer  is  to 
minimize  information.  In  most  other  comparable  circumstances  failure 
information  is  avidly  sought,  through  prototype  testing  and  sampling 
procedures,  but  those  traditional  approaches  are  inapplicable  here. 

Fleets  consist  of  a relatively  small  number  of  aircraft  which  are  in  a 
continuous  state  of  evolution  and  modification  and  which  are  brought 
into  operation  in  a serial  rather  than  simultaneous  manner.  Hence  sample 
sizes  are  generally  too  small  for  statistical  procedures  to  carry  much 
conviction,  and  for  the  leading  edge  of  high-time  aircraft  they  are 
always  too  small.  In  such  an  environment  actuarial  procedures  are  of 
relatively  little  use  because  the  operating  lifetime  of  an  aircraft  (in 
a fixed  configuration)  is  relatively  brief.  Actuarial  analyses  provide 
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interesting  historical  information  about  the  effectiveness  of  Maintenance 
policies  and  design  features,  but  they  cannot  be  a basin  for  maintenance 
policies. 

Acquisition  of  the  information  most  needed  by  maintenance  policy 
designers— information  about  critical  failures— is  in  principle  unaccept- 
able and  is  evidence  of  failure  of  the  maintenance  program.  Critical 
failures  entail  potential  (in  certain  cases,  probable)  loss  of  life,  but 
there  is  no  rate  of  loss  of  life  that  is  acceptable  to  a common  carrier 
or  military  organisation  as  the  price  of  failure  information  to  be  used 
for  designing  a maintenance  policy.  Thus  the  policy  designer  is  faced 
with  the  problem  of  creating  a maintenance  system  for  which  the  expected 
loss  of  life  will  be  less  than  1 over  the  planned  operational  lifetime 
of  the  aircraft.  This  means  that,  both  in  practice  and  in  principle, 
the  policy  must  be  designed  without  using  experiential  data  which  will 
arise  from  the  falluree  the  policy  la  meant  to  avoid. 

Maintenance  policy  designers  do  have  the  advantage  of  experience 
gathered  from  operation  of  previous  generations  of  aircraft.  Although 
those  aircraft  are  different,  both  in  the  design  and  fabrication  of  many 
of  their  constituent  parts  and  in  the  relationships  among  those  parts, 
it  is  nevertheless  true  that  many  constituents  are  unchanged,  and  most 
changes  are  minor  and  evolutionary  rather  than  revolutionary.  There  is, 
consequently,  a certain  continuity  from  one  generation  of  aircraft  to 
the  next  which  is  utilised  in  an  informal  way  by  experienced  maintenance 
engineers  and  aircraft  designers.  Although  it  is  difficult  to  formulate 
this  aspect  of  policy  design  in  mathematical  terms,  the  theoretician 
should  not  be  deterred  from  the  task  because  prior  experience  is  probably 
the  major  single  source  of  information  which  can  be  used  for  maintenance 
policy  design. 

In  short,  maintenance  policy  design  is  a problem  if  information  and 
of  statistics.  N.  Wiener  [15]  and  A.  N.  Kolmogorov  [5]  were  among  the 
first  to  recognise  the  close  relationship  between  statistics  and  infor- 
mation, particularly  with  regard  to  communication  theory.  C.  Shannon  [11] 
expanded  and  developed  their  ideas  to  create  a rigorous  and  useful  in- 
formation theory.  The  application  of  Shannon's  theory  to  maintenance 
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policies  requires  that  both  of  the  concepts  of  information  end  reliabil- 
ity be  formulated  in  terms  of  the  structure  of  the  sets  of  constituents 
of  aircraft,  and  of  functions  defined  on  those  seta.  Again  the  desira- 
bility of  a set-oriented  presentation  of  reliability  is  underscored. 

ill  We  will  summarise  the  contents  of  the  subsequent  sections.  Sec- 
tion 2,  Elements  of  Probability,  introduces  the  basic  concepts  and 
relationships  employed  throughout  this  work.  The  notion  of  a measurable 
space,  which  consists  of  a set  whose  elements  are  the  items  of  interest, 
a distinguished  collection  of  subsets  called  events,  and  a probability 
measure  which  measures  the  likelihood  of  an  event,  is  central.  Random 
variables  are  introduced  as  functions  defined  on  the  set  of  items  and 
compatible  with  the  structure  specified  by  the  collection  of  events. 

The  distribution  function  associated  with  a random  variable  and  proba- 
bility measure  is  often  the  starting  point  in  treatments  of  reliability 
theory.  This  necessitates  a brief  description  of  the  three  possible 
types  of  distribution  functions.  The  remainder  of  the  work  is  restricted 
to  distribution  functions  which  are  linear  combinations  of  absolutely 
continuous  distributions  (that  is,  those  which  ha  ^ a corresponding 
density  function)  and  discrete  distributions.  The.  discussion  and  nota- 
tion are  arranged  in  a sufficiently  general  manner  to  permit  a unified 
treatment  of  both  types  of  distributions  as  well  as  combinations  of  them. 
Combined  distributions  are  not  merely  academic  curiosities.  Whenever  a 
system  is  operated  continuously  over  a period  with  numerous  brief  (dis- 
crete) intervals  of  peak  stress  having  special  characteristics,  its 
survival  distribution  will  be  a linear  combination  of  an  absolutely  con- 
tinuous distribution  corresponding  to  the  continuous  mode  of  operation 
and  a discrete  distribution  corresponding  to  the  peak-stress  operation. 

A tungsten-filament  light  bulb  provides  a simple  example.  When  operated 
continuously  its  survival  characteristics  are  re.' seed  to  continuous 
filament  evaporation.  When  the  controlling  switch  is  first  turned  "on," 
the  cold  filament  is  heated  rapidly  and  undergoes  thermal  stresses. 

These  loads  evidently  depend  on  the  history  of  the  switching  activity 
and  yield  a discrete  distribution.  Similar  phenomena  occur  in  aircraft 
operation,  particularly  in  hot  ureas  of  gas  turbine  engines. 


These  circumstances  demand  consideration  of  the  Lsbesgue-Stlalt  j es 
integral*  The  latter  is  not  as  commonly  used  in  the  literature  as  it. 
should  be.  He  present  a brief  and,  we  hope,  readily  accessible  daflnl* 
tion  of  this  Integral  and  description  of  those  properties  needed  for  the 
applications.  The  discussion  is  based  on  the  int egret ion-by-parts 
formula  familiar  from  elementary  calculus. 

The  derivative  of  an  absolutely  continuous  distribution  is  called 
a density  function.  For  instance,  the  normal  density  function  is 
1 -it2 

e . Discrete  distributions  do  not  have  derivatives  in  the  ordi- 
nary sense,  so  it  is  not  possibls  to  unify  the  treatment  of  densities  of 
combined  distributions  without  genarallzing  the  concept  of  function. 

The  required  generalisation  la  the  generalised  function  known  as  the 
Dirac  delta  function,  long  used  by  engineers. 

With  these  preliminaries  in  hand,  conditional  probabilities  are 
defined  and  Bayes'  Principle  of  Inverse  Probability  is  introduced. 

Bayes*  Principle  is  a consequence  of  a certain  symmetry  of  roles  played 
by  observations  and  hypotheses,  a symmetry  most  readily  made  evident  by 
the  set-theoretic  formulation  of  these  concepts.  This  symmetry,  and 
Bayes*  Principle,  are  of  special  importance  to  ua  because  they  provide 
the  formal  mechanism  for  the  conversion  of  prior  observations,  e.g., 
survival  data  for  constituents  of  a currently  obsolete  aircraft,  into 
current  hypotheses,  e.g.,  Initial  specifications  for  hard-time  mainte- 
nance. This  application  of  Bayes*  Principle  is  taken  up  in  Section  7. 

Section  3,  Terminology  of  Reliability  Theory,  applies  the  general 
development  of  the  previous  section  to  the  particular  circuzwtances  of 
reliability  problems.  The  main  features  in  this  application  are  two: 
first,  time  t is  a random  variable,  and  the  events  are  parameter is ed 
by  t;  and  second,  the  event  associated  with  t is  Interpreted  as  the 
set  of  items  which  failed  prior  to  t.  Failure  and  survival  distribu- 
tions are  introduced,  and  it  la  shown  how  to  calculate  the  mean  time 
before  failure.  Failure  density  is  defined  end  used  to  introduce  the 
Important  concept  of  the  hazard  rats,  also  known  as  the  failure  rate. 

The  hazard  rate  has  two  useful  properties.  The  survival  distribution 
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can  be  expressed  in  terns  of  the  hasard  rate.  Moreover,  the  hazard  rate 
of  a collection  of  lndependontly  failing  items  is  the  sun  of  the  hasard 
rates  of  the  Individual  items. 

Section  A,  Useful  Survival  Distributions,  introduces  five  survival 
distributions  which  appear  frequently  in  the  literature:  exponential, 
normal,  Weibull,  lognormal,  and  gamma.  In  each  case  the  corresponding 
density  and  hasard  functions  are  displayed.  The  survival  characteristics 
of  various  jet  engines  or  their  subsystems  are  often  accurately  approxi- 
mated by  one  of  these  distributions.  An  exauiple  of  such  an  application 
is  supplied  for  each  one. 

The  exponential  Jistribution  plays  a unique  role  among  survival 
distributions.  Since  its  hasard  rate  is  constant,  it  separates  the  dis- 
tributions which  have  increasing  hasard  rates  from  those  which  have  de- 
creasing hazard  rates.  Thus  it  also  separates  two  fundamentally  distinct 
classes  of  maintenance  policies,  since  in  the  former  case  replacement  of 
old  by  new  items  reduces  failure  rate  and  can,  under  certain  circumstances 
be  cost-effective,  whereas  in  the  latter,  replacement  of  old  by  new  is 
only  reasonable  after  failure. 

Section  5,  Simple  and  Complex  Systems,  considers  infant  mortality 
and  wear  out  as  components  of  the  general  hazard  function.  Simple  systems 
consisting  either  of  one  cell  or  of  symmetrically  interconnected  replicas 
of  one  cell,  are  contrasted  with  complex  systems.  The  principal  conclu- 
sion is  that  complex  systems  are  not  amenable  to  complete  mathematical 
reliability  analysis. 

Section  6,  Reliability-Centered  Maintenance,  is  the  heart  of  this 
paper.  Mathematical  reliability  analysis  of  an  aircraft  is  impossible 
because  the  latter  consists  of  tens  of  thousands  of  diverse  parts.  The 
United  Air  Lines  Reliability-Centered  Maintenance  Program  presents  a 
method  for  grouping  parts  and  assemblies  Into  functionally  related  sub- 
systems and  systems,  and  for  systematically  eliminating  certain  of  them 
from  maintenance  policy  considerations.  The  purpose  of  Section  6 is  to 
represent  this  procedure  in  mathematical  terms. 
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The  collection  of  types  of  Items  whi-h  are  part  of  an  aircraft  is 
considered  as  a set  with  an  associated  survival  distribution*  This  set 
is  partitioned  into  maximally  independent  elements  which  loosely  corre- 
spond to  the  partition  described  in  the  Reliability-Centered  Maintenance 
Program.  To  each  independent  element  is  assigned  a cost  function  which 
includes  the  direct  and  indirect  estimated  costs  of  a failure  in  addi- 
tion to  the  costs  associated  with  the  maintenance  program  under  consider- 
ation. The  objective  of  the  maintenance  program  designer  is  to  minimise 
the  sum  of  these  cost  functions. 

Although  this  minimization  problem  is  too  complicated  to  admit  a 
purely  mathematical  solution,  it  is  nevertheless  arranged  in  a form  which 
makes  it  possible  to  recursively  and  systematically  revise  the  maintenance 
policy  so  that  total  cost  is  reduced  by  each  revision  cycle.  In  fact, 
since  the  cost  function  is  the  sum  of  the  cost  functions  associated  with 
the  elements  of  the  maximally  independent  partition,  it  follows  that  any 
policy  modification  which  reduces  the  cost  function  for  one  independent 
element  while  leaving  the  maximally  independent  partition  unchanged  must 
necessarily  reduce  the  total  cost  function.  Hence,  iteration  of  this 
procedure  of  local  cost  reduction  without  changing  the  partition  will 
lead,  in  the  limit,  to  a local  minimum  of  the  total  cost  function.  There 
is  no  way  to  prove  that  this  local  minimum  will  be  the  global  minimum, 
nor  is  there  as  yet  an  analytical  way  to  estimate  or  speed  up  the  rate 
of  convergence  to  the  local  minimum.  Nevertheless,  this  procedure, 
which  reflects  the  essence  of  the  Reliability-Centered  Maintenance  Pro- 
gram, assures  the  maintenance  policy  designer  that  the  program  is  self- 
improving  . 

The  section  closes  with  presentation  of  a geometrical  model  of  the 
Reliability-Centered  Maintenance  Program.  The  maintenance/ failure  cost 
function,  considered  as  a function  of  time  and  the  policy  parameters, 
defines  a surface  In  a multi-dimensional  space.  The  program  defines  an 
iterative  procedure  for  locating  a local  minimum  (as  a function  of  time) 
on  this  surface. 

Section  7,  Information  and  Maintenance  Policies,  returns  to  the 
theme  discussed  earlier  in  this  Introduction,  that  the  most  important 
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information  available  to  the  maintenance  policy  designer  is  provided  by 
failure  experience.  The  designer  cannot  plan  on  the  availability  of 
such  information.  Three  aspects  of  this  problem  are  discussed  in  Sec* 
tion  7.  First,  the  geometrical  interpretation  of  the  Reliability-Centered 
Maintenance  Program  presented  at  the  end  of  Section  6 is  elaborated  in 
order  to  show  why  that  program  can  succeed  using  only  the  small  amount  of 
information  which  is  actually  available.  In  essence,  the  program  seeks 
valleys  on  the  multi-dimensional  surface  defined  by  the  maintenance/ 
failure  cost  function.  It  achieves  its  objective  by  identifying  a 
direction  of  decreasing  cost  on  the  surface  at  the  point  corresponding 
to  the  maintenance  policy  in  effect,  and  then  moving  along  the  surface 
(i.e.,  modifying  the  policy)  in  that  direction.  If  the  distance  moved 
is  sufficiently  small,  iteration  of  this  process  converges  to  a valley 
point  on  the  surface,  that  is,  to  a local  minimum  of  the  cost/maintenance 
function.  The  central  fact  is  that  relatively  little  information  is 
needed  to  determine  a direction  of  decreasing  cost. 

The  difficult  problem  of  optimizing  the  size  of  the  policy  change 
at  each  Iteration  of  the  program  is  discussed  next.  More  information 
is  needed  to  assess  this  'step*  size  than  to  merely  identify  downward 
directions  on  the  surface  because  the  former  depends  on  the  magnitude  of 
the  derivatives  of  the  functions  defining  the  surface. 

There  follows  a brief  discussion  of  the  applicability  of  statistical 
methods  to  complex  long-lived  systems  having  few  replicas.'  The  physical 
universe  itself  provides  one  example  of  such  a system.  Insofar  as 
statistical  methods  are  conceived  as  an  analytical  apparatus  for  describ- 
ing sample  variation,  it  appears  that  they  cannot  be  relied  on  to  monitor 
or  analyze  the  reliability  of  complex  systems.  An  alternative  view, 
based  upon  Gibbs’  concept  of  a virtual  ensemble  of  systems,  is  presented. 
From  this  standpoint,  statistics  emerge  as  a selection  principle  which 
identifies  a system  among  the  virtual  ensemble  of  its  alternatives 
which  are  compatible  with  the  non-statistical  laws  of  nature. 

Information  plays  a central  role  in  the  Reliability-Centered  Mainte- 
nance Program  and  also  in  the  discussions  presented  throughout  this 
appendix^ but  especially  in  Sections  5-7.  In  the  final  subsection,  the 
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quantity  of  information  associated  with  a given  survival  distribution 
and  inspection  intervals  is  defined  ar  , then  applied  to  the  determina- 
tion of  the  inspection  intervals  such  that  each  interval  produces  the 
same  amount  of  information.  It  is  found,  in  agreement  with  expectations 
that  extension  of  replacement  and/or  inspection  intervals  is  justified 
during  periods  of  declining  hacard  rate. 

*>  • ' . 

A Glossary  of  Notation  and  Ten^pology  follows  Secti9P  7. 


2.  ELEMENTS  OF  PROBABILITY 


LI  Theories  of  maintenance  and  reliability  are  ultimately  bast'd  upon 
the  theory  of  probability  and  on  the  properties  of  various  distribution 
functions  which  have  been  found,  either  through  repeated  observation  and 
experience,  or  by  means,  of  theoretical  analyses,  to  occur  frequently  and 
play  a role  in  the  description  and  prediction  of  survival  characteristics. 

In  this  section  we  provide  a brief  summary  of  the  concepts  and 
mathematical  structures  used  in  the  theory  of  probability  in  order  to 
introduce  the  notations  and  techniques  which  will  be  used  later,  and 
also  to  delimit  the  range  of  our  subject. 

Probability  theory  is  concerned  with  events  and  measures  of  the 
likelihood  of  their  occurrence.  These  commonly  used  words  are  given 
precise  meaning  by  Introduction  of  the  fundamental  concept  of  a measurable 
space.  Let  y denote  an  arbitrary  non-empty  set  and  a collection  of 
subsets  of  U such  that* 

y « n ; (2.1.1) 

oil n (y  - 0)2)*  ft  whenever  us^cfi  and  ; (2.1.2) 

OO 

ft  whenever  oi^cA,  1*1,2, 3,...  . (2.1.3) 

The  elements  of  fi  are  called  events.  Eq.  (2.1.1)  states  that  the  set 
of  U (the  "universe"  of  discourse)  is  an  event;  the  meaning  of 
eq.  (2.1.2)  is  indicated  by  Figure  2.1  below,  and  eq.  (2.1.3)  asserts 
that  any  sequence  of  events  can  be  combined  to  form  an  event. 


*See  the  Glossary  of  Notations  and  Terminology  for  definitions  of  c, 
H,U,  etc. 
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Figure  2.1.  Illustrating  Eq,  (2.1.2) 

* • : . ; y%-  ' " ,l\  ■ 

)r'^  ••  ‘ . ’•  t‘'i'  • • f 

'>;•••  y‘:- 

Ue  wish  to  assign  some  measure  to  the  probability  of  occurrence  of  an 
event.  This  is  achieved  by  considering  a function 

P : ft  -►  [0,1]  (2.2) 

which  associates  to  each  event  a number  between  0 and  1 inclusive  such 
that 

P(U)  - 1;  (2.3.1) 

00 

e(M“i)  ■SE<“i>  (2-3-2) 

whenever  the  are  disjoint  events,  that  is,  and  o^Puoj  «*  0 for 

il*j.  Such  a function  P is  called  a probability  measure.  In  order  to 
emphasize  that  a probability  measure  is  a function  defined  on  sets  rather 
than  on  numbers,  we  use  bold  face  type  to  denote  it. 

A probability  measure  is  defined  on  the  collection  of  events  ft, 
that  is,  on  a collection  of  certain  subsets  of  U.  It  is  also,  important 
to  be  able  to  consider  functions  defined  on  U itself,  but  not  every 
such  function  can  be  effectively  studied  by  analytical  means,  so  it  be- 
comes necessary  to  identify  a special  family  of  functions  on  U which 
can  be  conveniently  and  effectively  studied.  These  are  called  random 
variables  and  are  specified  as  follows.  , , 

Suppose  f:U  ->  IR  is  u real-valued  function  (see  Figure  2.2). 

Each  real  number  x can  be  used  to  specify  a subset  w(x)  of  y by 
putting 

w(x)  * {f,cU : f(f.)  ^ x).  . (2.4) 
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u(x2)  * U«Jj  : f (O  < x2} 

Xj  < x2  implies  wCx^)  Cui(x2) 

Figure  2.2.  Schematic  Diagram  of  a Random  Variable 

The  set  w(x)  may  be  an  event,  that  is,  u(x)  may  belong  to  D.  If 
w(x)  la  an  event  for  every  choice  of  a real  number  x,  then  the  function 
f is  called  a random  variable.  The  property  of  being  a random  variable 
depends  on  the  collection  of  events  n as  well  as  on  the  particular 
function  f. 

The  concept  of  integration  plays  an  essential  role  in  probability 
and  statistics,  and  hence  in  the  theory  of  reliability.  A random  vari- 
able f is  called  integrable  if  it  can  be  integrable  over  the  whole 
space  y with  respect  to  the  probability  measure  P.  The  integral  is 
understood  in  the  sense  of  Lebesgue  (cp.  [12],  [15]).  In  many  practical 
situations  this  integral  <?an  be  expressed  in  ter  as  of  the  ordinary  Rietunn 
Integral  and/or  a series  summation.  We  will  have  occasion  to  say  more 
about  this  below. 

The  integral  of  a ratidom  variable  f ovor  the  whole  space  y with 
respect  - to  the  probability  measure  P will  be  written 


> vwitfH 


W pmqpjiffgBiyT  u-itkv*!  i ■■-in-;i.wn**  huih 


From  eq.  (2.3.1)  It  follows  that 


/• 


dP  « 1 . 

If  w is  an  event,  then  the  function  cw  defined  by 

!1  -if  £ < o> 

o if  u.  ‘ 


(2.6) 


(2.7) 


called  the  indicator  function  of  the  event  <*>,  is  a random  variable,  and 
the  product  of  cwf  is  also  a random  variable  whenever  f is. 

I’ing  this  product,  we  define  the  integral  of  f over  the  event 

w by 

r r 

(2.8) 


Lr.fi  ufdF  . 


The  integral  of  the  random  variable  f over  the  whole  space  U is 
called  the  mean  value  of  f or  also  the  expectation  of  f,  and  will  be 
denoted  by  f: 


/■ 


fdP 


(2.9) 


The  number  f fdP  can  be  thought  of  as  the  mean  value  of  f on  the 


event  oi . 


The  variance  of  the  random  variable  f (which  is  also  the  square  of 
the  standard  deviation  o(£)  of  f)  is  defined  by 


o(f)2  - (f  -Tj?  , 


that  is 


o(f) 


2 


/- 


?)2dP 


(2.10) 


(2.11) 


Notation  is  somewhat  abused  in  this  equation;  the  number  f.  (the  mean 
value  of  the  random  variable  f)  is  used  to  stand  for  the  random  variable 
^ where  ^ is  the  random  variable  which  takes  on  the  value  1 for 
each  element  C«y. 


~ »!  -tut  ■ ~ 


In  probability  and  statistics  one  is  most  often  Interested  in  the 
probability  measure  of  the  set  of  those  events  for  which  a random  vari- 
able f satisfies  f(€)  & x for  all  £<u>,  where  x is  some  real  number. 
Since  f is  a random  variable,  u>f(x)  « {£c  y : f(£)  & x}  is  an  event,  jo 
by  definition 


Pf(x) 


(2.12) 


is  a function  of  x which  varies  from  0 to  1.  If  the  random  variable 
f is  fixed  for  the  discussion,  or  otherwise  understood,  then  the  sub-- 
script  f.  will  be  omitted  and  we  will  write  P(x)  in  place  of  Pj(x). 
This  function  x -*•  P(x)  is  called  the  distribution  function  of  the  random 
variable  f;  it  is  the  distribution  function  customarily  used  in  statis- 
tics. In  order  to  emphasize  that  P is  a function  of  a numerical  vari- 
able rather  than  a set  function  it  is  printed  in  ordinary  Roman  type. 

A distribution  function  has  the  following  properties: 

x P(x)  is  a non-decreasing  function;  (2.13.1) 

P (x)  - P(x+0)  , (2.13.2) 

where  P(x+0)  ■ lim  P(x+h)  (h  approaches  0 
hNO 

through  positive  values;  thus  P is  continuous 
from  the  right) ; 

P(-«)  = 0,  P(+»)  - 1 , (2.13.3) 

where  P(±®)  - lim  P(x) 

x-*-  ± « 

We  started  with  a collection  ft  of  sets  called  events  and  associated  a 
probability  measure  P with  these  sets.  The  values  assumed  by  P are 
real "numbers  in  the  interval  [0,1].  Then  we  defined  random  variables 
and  their  associated  integrals  relative  to  the  probability  measure.  By 
means  of  the  latter  we  have  been  able  to  construct  an  interplay  between 
functions  defined  on  sets,  such  as  probability  measures  and  random 
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variables,  and  functions  defined  on  real  numbers.  The  distribution 
function  Pf  : IR  -*•  [0,1]  provides  a method  for  completing  this  transfer- 
ence by  defining  a measure  on  sets  of  real  numbers  which  will  correspond 
to  J and  thereby  enable  us  to  express  all  of  the  integrals  which  occur 
in  terms  of  integrals  of  functions  of  real  numbers,  rather  than  as  inte- 
grals of  functions  of  sets.  This  is  important  because  the  analysis  of 
functions  of  real  numbers  is  highly  developed  and  well  known.  This  nice 
property  is  achieved  by  defining  a measure  on  IR  associated  with  the 
distribution  function  Pf  in  the  following  way. 

If  (a,b]  ■ (x«IR:  a<  xsb),  then  define  the  measure 


yn((a,b])  * P(b)  - P(a) 


(2.14) 


where  P - Pf  is  the  distribution  function  of  the  random  variable  f. 
Since  P(x)  “ Pf(x)  is  the  probability  that  the  random  variable  f 
assumes  a value  £x,  it  follows  that  yp((a,b])  is  the  probability  that 


f assumes  a value  in  the  half-open  interval  (a,b].  up  is  a measure  on 


the  real  line.  The  integral  of  a real-valued  function  g : IR->-IR  with 
respect  to  this  measure  is  called  the  Lebesgue-Stieltjes  integral  of  g 
with  respect  to  yp,  written 

+ 00 


/8(x)%./  g(x)dPf(x) 


(2.15) 


Use  of  the  Lebesgue-Stieltjes  integral  unifies  the  treatment  of 
discrete  probability  distributions  and  probability  distributions  which 
have  density  functions.  Nevertheless,  the  Lebesgue-Stieltj es  integral 
has  not  yet  become  a standard  part  of  the  education  of  those  who  use 
statistics  nor  an  explicitly  uued  tool  in  most  reference  books.  This  is 
no  doubt  due  to'  the  greater  technical  complications  of  developing  the 
properties  of  this  integral  in  the  most  general  setting  (cp.,  e.g.  [12]). 
Fortunately,  for  the  cases  of  interest  to  us,  there  Is  a simple  way  to 
express  the  Lebesgue-Stieltj es  integral  in  terms  of  ordinary  integrals 
and  to  obtain  the  properties  of  the  former  from  the  well-known  properties 
of  the  latter.  After  some  preparatory  remarks  we  will  introduce  this 


approach,  which  will  enable  us  to  simplify  and  unify  our  discussion  of 
reliability. 

A distribution  function  P of  a random  variable  can  always  be  ex- 
pressed  as  a convex  sum  of  distribution  functions  of  three  types: 

P - a1Pabs  + a2PdiS  + a3P8ing  , (2.16) 

> 

where  0 £ a^  i 1 and  ai+a2+a3  ■*  1.  Pa^8  is  called  the  absolutely 

continuous  part  of  P,  Pd*8  is  the  discrete  part  of  P,  and  P®in8  is 
the  singular  part  of  P.  The  absolutely  continuous  part  pabs  can  be 
differentiated  with  respect  to  x (therefore  pabs  a continuous  func- 
tion), so  we  can  write 


dP 


abs  dP 


abs 


dx 


dx  . 


(2.17) 


If  P * P3*58,  that  is,  if  the  distribution  function  of  the  random  vari- 
able f is  absolutely  continuous,  then 


dp  - £d« 


and  the  function 


p(x)  ■ . , (that  is,  equal  by  definition)  (2.18) 

def 

is  called  the  (probability)  density  function  of  the  random  variable  f. 
The  usual  continuous  distribution  functions  which  appear  in  statistics 
text  books  are  absolutely  continuous  and  therefore  possess  density 
functions.  The  latter  are  usually  the  main  topic  of  study  rather  than 
the  more  general  but  more  complicated  distribution  functions. 

The  discrete  part  Pdis  -of  thq  general  distribution  function  P 
is  a step  function  with  at  most  a countable  number  of  discontinuities. 
That  is,  if  the  discontinuities  of  Pd*8  occur  at  the  numbers  x^, 
k * . . .-2, -1,0, 1,2, . . . , then  there  are  non-negative  constants  b^  such 
that 

pd is (x)  m bk  if  Xk*x<  xk+]  (.2,19) 

where  0 < b}.  < 1. 
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Figure  2.3  gives  an  example  of  such  a function. 


Notice  that  sufficiently  far  to  the  left  in  the  figure,  Pdis(x)  - 0,  and 

dis 

sufficiently  far  to  the  rl*ht,  P (x)  * 1;  this  will  occur  if  the  number 
of  discontinuities  is  finite.  Otherwise,  in  accordance  with  eq.  (2.13.3), 
P^is  need  only  approach  0 (respectively,  1)  in  the  limit  as  X'*-00  - 

(respectively,  x->-+0°  ).  In  the  figure  the  notation  k ■ " means  that  the 
left-hand  endpoint  of  the  interval  is  included,  whereas  the  right-hand 
endpoint  is  omitted.  This  means  that  Pd*s  is  continuous  from  the 
right,  and  is  the  graphical  interpretation  of  eq.  (2.13.2)  for  Pdis. 

The  quantity 

ck  - b.  - b,.  - PdlS(xk)  - lira  Pdl8(x)  (2.20) 

def  x/xk 

is  the  "jump"  of  the  function  Pd*-S  for  the  discontinuity  at  x * xk» 

The  third  part  of  the  general  distribution  function,  the  singular 
part  0f  no  practical  Importance.  It  is  a function  that  is 

continuous  everywhere  and  has  a derivative  equal  to  zero  everywhere  ex- 
cept on  some  event  (subset)  whose  probability  measure  is  0.  It  is  a 
remarkable  fact  that  singular  distribution  functions  exist.  Such  a 
function  Ps*n®  i8  non-decreasing;  Ps*n®(-®)  * 0;  and  P8^n®(+®)  « 1, 


which  shows  that  P8*n®(x)  actually  increases  as  x Increases;  since 
its  derivative  is  0 almost  everywhere,  Ps*n®  i3  constant  almost  every- 
where, But  it  is  also  continuous;  there  are  no  "jumps."  Although  func- 
tions having  these  unusual  properties  can  be  constructed  (cp.  [15]), 
they  are  so  complicated  and  pathological  that  they  cannot  piay  a role 
in  practical  applications  of  probability  theory.  Therefore  singular 
distributions  will  be  excluded  from  consideration  in  what  follows: 
hereafter,  a probability  distribution  will  consist  of  a linear  combina- 
tion of  an  absolutely  continuous  probability  distribution  and  a discrete 
probability  distribution. 

3 We  are  now  prepared  to  express  the  Lebesgue-Stieltjes  integral 

P 

I g(x)dP  of  a function  g relative  to  such  a probability  distribution 

•A* 

in  familiar  terms.  The  "integration  by  parts"  formula 


f. 


B 

gdP  *»  g(x)P(x) 


8 


/ 

•'a 


Pdg 


(2.21) 


is  valid  for  the  Lebesgue-Stieltjes  integral  [12].  We  will  use  it  to 
define  that  integral  in  terms  of  the  familiar  integral  for  functions  g 
which  are  differentiable.  Thus,  if  dg/dx  exists,  define 


£ 


6 

gdP  * g(x)P(x) 


-/ 


P(x)  -j^dx  . 
ax 


(2.22) 


\ 


The  integral’  on  the  right  side  is  a conventional  (Lebesgue  or  Riemann) 
intep.ral.  All  the  properties  of  the  Lebesgue-Stieltjer  integral  can  be 
obtained  by  interpreting  the  left  side  of  eq.  (2.22)  in  terms  of  the 
right  side. 

.dis 


In  particular  if  P 


is  the  discrete  distribution  given  by 


eq.  (2.19)  and  if  , xN  i ?■< 


£ 


N+l ' 


then 


8dP<lls  - £ .•„«(*„> 

k=M 


(2.23) 
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where  is  the  jump  of  Pdis  at  the  discontinuity  x^.  Thin  formula 
is  verified  by  a simple  calculation  using  eq.  (2.22).  Indeed*  since 
P(8)  ■ bN  and  P(a)  - b^,  by  eq.  (2.19), 


e 

g(x)P(x) 

a 


- bjjg(S)  - bM_lg(a) 


(2.24) 


The  last  Integral  can  be  expressed  as  a sum  of  three  parts  (cp.  Fig- 
ure 2.4) i 


Figure  2.4.  Calculation  of  Lebesgue-Stieltjes  Integrals 
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■V 


(2.26.3) 


N-l 

^^M^k+l*  “ 8(xk))  * 

k-M 


Substitution  of  eq.  (2.26)  In  eq.  (2.24)  yields 


y gdP  - bNg(S)  - bM_lg(a)  - bM_1(g(xM)  - g(a)) 

- bN(g(6)  - g(xN))  - kk(«<Vl>  ' 8(*k>) 

k-M 

- bNg(xN)  - bM_1g(xM)  +^bkg(xk)  -^bkg(xk+1) 

k-M  k-M 


N-l  N 

E 

k-M  k-M+1 


w-i  w 

" W - bM-l8(*H>  ‘EVl1  ' £bk-l8<xk>  ‘ 


'by  relabelling  the  summation  index  in  the  last  sum), 

N 

■ E <bk  " bk-l)8<xk> 

k-M 

N 

' E ck®^xk) 

k-M 

where  cjt  " bk  ” bk-l  i#  tbe  ^ump  PdiS  at  xk-  Thus  the  Lebesgue- 
Stieltjes  Integral  with  respect  to  a discrete  distribution  reduces  to 
the  usual  series  sum.  This  means  that  both  absolutely  continuous  and 
discrete  distributions  can  be  treated  simultaneously  and  in  a uniform 
manner. 


Mixed  distributions,  that  is,  distributions  which  have  both  an 
absolutely  continuous  and  a discrete  component,  are  not  uncommon.  For 
instance,  the  failure  distribution  for  light  bulbs  is  of  this  type. 


Another  example,  more  closely  related  to  the  mail?  theme  of  this  work,  la 
provided  by  jet  aircraft  engine.  In  particular,  the  failure  distribu- 
tion of  turbine  blades  can  be  considered  as  a combined  distribution. 

The  absolutely  continuous  part  is  associated  wtih  failures  which  occur 
as  a function  of  operating  time  or  wear,  and  the  discrete  part  Is  asso- 
ciated with  the  periodic  stresses  due  to  rapid  temperature  changes  which 
occur  in  the  blades  during  take-off  operations. 


Although  a discrete  distribution  does  not  have  a density  function, 
there  is  the  useful  notion  of  a generalised  density  function  which  makes 
it  possible  to  study  densities  of  combined  distribution  functions  in  a 
unified  way.  We  will  use  this  concept  in  Section  3 and  again  in  Sec- 
tion 6,  but  this  is  the  logical  place  to  introduce  it. 


Let  6(x)  denote  the  Dirac  delta  function,  a generalised  function 
characterized  by  the.  property  that  if  a<  Xq<  8 and  g(x)  is  a function, 
then 

I g(x)6 (x-xg)dx  - g (xq)  . (2.27) 
•'a 


,dis , 


if  ck  denotes  the  "jump"  in  P (x)  at  the  discontinuity  x ■ xk, 
then 

/B  _ N -B 

g(x)  y , ck6(x-xk)dx  - / ck  J g(x)6(x-xk)dx 


k-M 

N 


■ 2 °k8<V  ' 


k-M 


hence,  by  eq.  (2.23), 


- Aw  v 


ck6(x-xk)dx  , 


(2.28) 


(2.29) 


so 


dpdis  « 

sr " def  L*  ck6(x-xk) 

k 


(2.30) 
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f 


can  be  thought  of  as  the  generalized  density  function  corresponding  to 

cl  l.s 

the  discrete  distribution  P . This  extension  of  the  notation  of 
density  function  makes  it  possible  to  study  densities  of  combinations 
of  absolutely  continuous  and  discrete  distributions  in  a uniform  way. 


ZJl  The  notion  of  independence  of  random  variables  will  be  of  special 
importance  in  what  follows  because  it  will  provide  the  means  for  reducing 
complex  problems  to  tractable  components.  If  , i*l,2,...  is  a se- 
quence of  sets,  ft^  a collection  of  events  on  y^,  a probability 
measure  on  ft^,  and  f^  a random  variable  on  U^,  then  the  products 


y * Mi  * y 2 * * * * * 


(2.31.1) 


ft  =*  ft^  x x 


(2.31.2) 


P = Px  x P2  x ...  , (2.31.3) 

define  a collection  of  events  ft  on  U and  an  associated  probability 
measure  P.  Notice  that  P is  merely  a probability  measure  on  sets  and 
does  not  have  anything  to  do  with  a particular  random  variable.  The 
random  variable  f^  defined  on  can  also  be  considered  as  a random 

variable  on  the  product  set  U by  defining 


f^(?^,C2>**,»  • • •)  = f i (5  • 


(2.32) 


If  f is  a random  variable  on  U such  that  its  distribution  function 
Pf  is  the  product  of  the  distribution  functions  Pf^  of  the  random 
variable  f^  on  U,  that  is,  if 


pf(Cj_ , e2,...)  - pf1(f.i)Pf2«2) 


(2.33) 


then' the  f^  are  said  to  be  independent  random  variables.  If  fl>f2>**' 
are  independent  random  variables,  then  the  mean  of  f“f^f2«..  is  given 
by  (cp.  eq.  (2.9)) 


(f1f2...)  = f I 


fdP  = Pf(~)  = Pfl(”)Pf2(“) 


f f 

2»  • • > 


(2,34) 


I 


that  is,  the  mean  value  of  a product  of  independent  random  variables  is 
the  product  of  their  mean  values. 


2. 5 In  the  theory  of  reliability  and  maintenance  the  notion  of  the 
conditional  probability  of  survival  plays  a central  role.  If  and  0)2 
are  two  events  and  if  P(io^)>  0,  that  is,  the  probability  of  event  u)^ 
is  positive,  then  the  conditional  probability  of  ^2  g^ven  is 

po^n^) 

E(“2/"l)  ' del  "l^)  • 0.35) 

If  areas  of  sets  are  used  to  represent  probabilities,  then,  in  Figure 

f 

2.5,  P(i02/u>i)  can  be  interpreted  as  the  ratio  of  the  area  of  the>  region 
0)2  H(i)|  to  the  area  of  the  region 


Figure  2.5.  Conditional'  Probability 

1 

Related  to  this  interpretation  of  conditional  probability  is  the 
important  lijyes1  Principle  of  Inverse  Probability.  Suppose  that 
m 2 ,d>2  ,<o  ^ ate  three  events  in  U which  have  a non-empty  intersection.^ 
The  situation  is  depicted  in  Figure  2.6. 


* 
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mi'll  I 


■dh*  1 baamitoBMia 


I * Area  (to^Oii^  Ow3) 


Continuing  to  interpret  the  probability  of  an  event  to  as  the  "area" 
of  the  set  oi  in  t.  *->  figure,  let 

I » Area  (u>.j  0 oo20o>3)  * P(u^O  oijOoo^)  , 

J - Area  (oo^Ou^)  » P (u)^Oo>2)  , 

K - Area  (o)20u)3)  = P(u»20a>3)  , 

The  numerical  ratio  I/(J • K)  can  be  expressed  in  two  different  ways: 

1/ (J  • K)  - (I/J)/K  - (I/K) / J , (2.: 


that  is , 

P(o)1|w20oo3)/P(u)10oo2)  ■ POOjIu^O  io2)/P(io2Ou3)  , 
or  equivalently. 


P(o)1|o)20u3) 


P(u>3|u)10a>2)P(w10  o»2) 
P(oo2Ouj3) 


(2. 
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In  order  to  interpret  eq.  (2.36)  in  a manner  appropriate  for  our 
later  needs,  we  will  discuss  two  types  of  events:  observations  and 
hypotheses.  Statistical  inference  generally  proceeds  from  a collection 
of  hypotheses,  whose  probabilities  are  assumed  known,  to  assessments  or 
predictions  of  the  probability  of  various  observations.  This  procedure 
can  be  inverted  to  provide  assessments  of  the  probability  of  various 
hypotheses  when  a collection  of  observations  is  given.  Adopting  the 
latter  viewpoint,  let  : i«Jl}  be  a fixed  collection  of  hypotheses 


and  set 


w = D o)4  ; 

i«J 


(2.38) 


• ' I 


oj  jfs  the  event  which  corresponds  to  the  simultaneous  validity  of  all 
hypotheses  Let  H denote  another  hypothesis  and  o an  observa- 

tion. Then. eq.  (2.37)  can  be  rewritten,  using  this  notation,  as 


P(h|o  fl  to) 


P(o  !HOo3)P(Hno>) 
P(7fWj 


(2.39) 


The  quantity  P(p|onw)  is  called  the  likelihood  ratio  for  the 
hypothesis  H given  the  observation  o and  the  fixed  collection  of 
hypotheses  (oi^  : icJ)  . The  likelihood  ratio  is  proportional  to  the 
probability  of  the  observation  given  the  hypothesis  H (and  u)  multi- 
plied by  the  a priori  probability  of  H (and  w).  The  factor  of  propor- 
tionality is  independent  of  the  collection  of  alternative  hypotheses  H 
under  consideration.  Therefore,  it  is  reasonable  to  select  that  H 
from  among  a collection  of  alternative  hypotheses  for  which  the  product 
P(o|H  nu)P(Hn<o) , and  hence  the  likelihood  ratio  P^HloOa)),  is  maximal. 
This  is  Bayes1  Principle.  It  can  be  considered  as  a generalization  of 
the  well-known  Maximum-Likelihood  method  of  estimation  of  parameters 
[41,  [13].  In  the  latter,  the  event  H is  a set  of  values  of  the 
parameters  of  a probability  distribution , w ■ HUo,  and  cr  is  the  event 
which  consists  of  independent  observations  xi»X2*****xn  of  a random 
variable  x.  If  it  is  assumed  that  P(H)  is  independent  of  H,  then 
the  likelihood  ratio  is  proportional  to 


^ 'f  h.t  w>  r,rr',?^T ' r,T ' '■  ■ 


P(o|H)  - n PfrjH)  , (2.40) 

i-1 

where  the  right-hand  side  uses  a suggestive  if  not  quite  exact  notation. 
The  right-hand  side  is  called  the  likelihood  function.  Bayes*  Principle 
applied  to  this  special  case  is  the  Maximum-Likelihood  method. 

Notice  that  Bayes*  Principle  is  a consequence  of  the  symmetry  in- 
herent in  the  definition  of  conditional  probability  (as  exhibited  in, 
eq.  (2.36)  and  the  triple  intersection  displayed  in  Figure  2.6),  and  the 
symmetrical  interpretation  of  hypotheses  and  observations  as  events. 

Thus  there  is  a certain  degree  of  interchangeability  of’  hypotheses  and 
observations.  Hypotheses,  which  remain  unchallenged  by  observations 
assume,  as  experience  accumulates,  the  role  and  properties  of  observa- 
tions themselves,  and  observations  (considered  as  events)  can  be  con- 
verted to  hypotheses  in  the  right  circumstances. 

This  interchangeability,  or  substitutability,  plays  an  unheralded 
but  substantial  role  in  the  practical  analysis  of  the  reliability  of 
rapidly  evolving  complex  systems,  for  which  only  small  sample  observa- 
tions can  ever  be  available.  Modern  commercial  and  military  aircraft 
provide  an  example.  The  relatively  small  production  runs  and  the  very 
small  number  of  aircraft  of  any  one  type  which  reach  high  operating 
times  preclude  the  possibility  of  collecting  extensive  actuarial  data 
for  the  assessment  of  hypotheses  concerning  reliability.  This  difficulty 
is  mitigated  to  some  extent  by  making  hypotheses  (concerned,  e.g.,  with 
Hard  Time  maintenance  intervals)  based  upon  prior  experience  with  simi- 
lar although  by  no  means  Identical  equipment.  In  this  way  prior  limited 
observational  experience  is  transformed  into  current  working  hypotheses 
Against  which  current  observations,  limited  though  they  may  be,  are 
tested.  In  turn,  these  observations  form  the  foundation  for,  and  in  the 
sense  described  above,  are  equivalent  to,  future  hypotheses.  Although 
this  application  of  Bayes*  Principle  is  rarely  made  explicit  and  quan- 
titative — one  speaks  instead  of  the  need  for  "experienced"  reliability 
analysts  — it  nevertheless  plays  a major  role  in  the  practical  analysis 
of  complex  systems  which  evolve  wit)i  time,  have  a relatively  brief  life, 
and  of  which  only  a small  number  of  replicas  are  fabricated. 
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3.  TERMINOLOGY  OF  REUABtLITY  THEORY 

/ 

3. 1 The  basic  concept  in  reliability  theory  is  that  of  the  probability 
of.  failure  or*  if  one  prefers  a more  sanguine  outlook,  the  probability  of 
survival,  frequently  called  the  reliability.  For  this  application  we  may 
thinK  of  the  set  U as  a universe  of  items  or  components  5 whose 
failure  characteristics  are  of  interest  to  us.  The  subsets  of  IJ  which 
constitute  events  consist  of  those  r*u  which  have  failed  prior  to  a 
given  time.  Thus,  if  t denotes  time,  then  the  collection  ft  of  events 
consists  of  the  subsets 


u>(t)  * {£  < U : £ has  failed  prior  to  t}  ; 


■*  {u>  (t) : t « H } . 


(3.1) 


Recall  that  It  denotes  the  set  of  all  real  numbers.  It  Is  evident  that 
tl<t2  implies  aj(ti)  Coj(t2)  since  each  item  which  failed  prior  to  ti 
certainly  failed  prior  to  t2>  This  is  illustrated  in  Figure  3.1,  which 
is  the  same  as  Figure  2.2  although  it  has  a different  interpretation. 

In  this  way  the  collection  of  events  & is  parametrized  by  the  real- 
valued time  variable. 


ti<t2 


Figure  3.1.  Sets  of  Failed  Items 


Associated  with  the  universe  U of  items  and  the  collection  R of 

t 

events  is  a probability  measure  F which  expresses  the  probability  of 
failure  corresponding  to  events  w « R (We  may  think  of  F(u(t))  as  the 
"area"  occupied  by  the  event  oi(t)  if  we  interpret  probabilities  as 
areas  (e.g. , in  Figure  3.1)  and  recall  that  the  total  "area"  of  U 
itself,  considered  as  an  event  in  R,  must  be  equal  to  1).  With  this 
interpretation,  F(ui(t))  is  the  probability  of  failure  prior  to  time  t; 

the  probability  of  survival  associated  with  the  event  u - u(t)  is 

\ 

R(ui(t))  - 1 - F(<u(t) ) , (3.2) 

* m \ 

so  R(w (t) ) is  the  probability  of  survival  until  time  t,  also  called 
the  reliability.  R(a)(t))  can  be  interpreted  as  the  "area"  of  y - w(t). 

If  U consists  of  N items,  if  the  number  of  items  in  <o(t)  is  1 

N(t),  and  if  the  measure  F is  counting  measure,  then,  since  N(t)  is  j 

the  number  of  items  which  failed  prior  to  t , j 


F(w(t)) 


N(t) 

N 


(3.3) 


In  order  to  transfer  the  above  notions  from  the  realm  of  sets  to 
the  realm  of  numbers,  where  the  methods  of  calculus  can  be  applied,  we 
use  the  indicator  function  defined  by  eq.  (2.7)  to  obtain  a failure 
distribution  function.  Recall  that 


,-•»  j 1 if  5* 
cui(t)  “ \0  if  U 


»(t) 
Crfu(t)  . 


(3.4) 


The  function  which  assigns  to  each  event  to  the  number  1 is  a random 
variable.  The  corresponding  distribution  function  associated  with  the 
failure  probability  measure  F is,  by  eq.  (2.12) * 


F(t)  - J d|  * y*cu(t)d£  - g(iuCt))  . 

<*>(£)  . ' ■ ..... 


(3.5) 


Thus  the  failure  distribution  F(t)  i»  marOl?  the  probability  measure 
| considered  as  a function  of  the* time  parameter  t. 
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We  can  calculate  the  aurvival  distribution  R(t)  in  a similar  way, 
but  it  is  easier  to  use  eq.  (3.2)  directly  to  obtain 


R(t)  « 1 - F(t)  . 


(3.6) 


Notice  that  F really  is  a probability  distribution  in  the  sense  speci- 
fied by  Eq.  (2.13),  but  that  R(t)  has  slightly  different  properties: 


R(t)  is  non- increasing; 


R(t)  - R(t  + 0)  ; 


R(-°°)  *=  1 , R(+'“)  ■ 0 ; 


(3.7.1) 


(3.7.2) 


(3.7.3) 


each  of  these  properties  follows  immediately  from  the  defining  relation 
eq.  (3.6)  and  the  corresponding  eq.  (2.13).  R(t)  will  be  referred  to 
as  a survival  distribution  even  though  it  is  not  a distribution  in  the 
technical  sense. 

The  graph  of  the  function  t »R(t)  is  called  a survival  curve. 

Figure  4.2  of  Section  4 exhibits  a typical  survival  curve  for  an  aircraft 
gae  turbine  engine. 

In  practice,  measurements  and  observations  are  always  discrete  end 
finite  in  number.  This  means  that  actual  worldly  knowledge  of  survival 
and  other  probability  distributions  only  supplies  an  approximation 
which  (may  be  exact  and)  is  a discrete  distribution.  On  the  other  hand, 
theory  and  philosophical  beliefs  about  the  nature  of  reality  often  suggest 
that  observations  are  discrete  sets  of  values  drawn  from  absolutely 
continuous  distributions  or  from  combinations  of  absolutely  continuous 
and  discrete  distributions;  moreover,  the  techniques  of  mathematical  , 
analyses  are  more  highly  developed  for  studying  absolutely  continuous 
distributions.  Consequently,  whenever  it  is  possible  to  do  so,  it  is 
desirable  to  suppose  that  observations  have  been  drawn  froti  ideal  and 
hypothetical  absolutely  continuous  distributions.  The  density  functions 
corresponding  to  these  distributions  play  a central  role  in  most  develop- 
ments of  the  subject.  The  absolutely  continuous  distributions  which  have 
bdin  found  to  be  most  useful  in  practice  and  ate  moat  extensively  studied 
by  theorists  will  be  introduced  in  Section  4. 
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The  generalized  density  function  corresponding  to  the  failure 
distribution  F(t)  * Ff(t)  (which  may  consist  of  both  an  absolutely 
continuous  part  and  a discrete  part)  is  denoted  p(t);  that  is 

(3.8)’, 

is  the  failure  probability. density.  Equation  (2.30)  must  be  used  to 
express  the  generalized  density  function  for  the  discrete  part  of  F. 

From  Eq.  (3.6)  we  see  that  the  survival  probability  density  is  given  by 

a -P(t)  • (3.9) 

Corresponding  to  survival  and  failure  distributions  are  conditional 
survival  and  conditional  failure  distributions.  First  consider  the 
conditional  probability  of  survival.  According  to  eq.  (2.35),  the 
conditional  probability  of  the  event  u>2  given  is 

R(u>2n<»>i) 

5<“2K>  - (3-10) 

t distributions  parametrized  by  survival  time  we  consider  two  times, 
ti  t2*  Then  the  definition  of  w(t),  eq.  (3.1),  implies  w(ti)  Cui(t2) 
so  the  complementary  events  satisfy  the  reverse  inclusion,  i.e.  , 

U - u)(t^)  DU-  w(t2)  • 

The  formula  R(t)  - 1 - F(t)  implies  that  R(t)  >=  R(U  - w(t)).  Introduce 
U ~ w^(t),  W2  * U - U>2(t) . Then  &>2Cwi:'  items  in  ui2  have 
survived  at  least  until  t2  whereas  items  in  have  survived  at  least 

until  t ]_  (cp.  Figure  3.2).  Now  we  can  compute  the  conditional  proba- 
bility that  an  item  will  survive  at  least  until  t2  given  that  it  has 
survived  until  tlf  where  ti<t2*  From  eq.  (3.10),  this 


is 


u 


>(tl)  C(jj(t2) 


Figure  3.2.  Conditional  Probability  of  Survival 


R(t2|ti)  - R(oi2|aj^) 


(3.11) 


B<«2> 

= slnce  w2C«i  . 

R(t2) 

= R(tj.)  * ... 

We  are  to  understand  that  tx,  and  hence  the  condition  uCtx),  is 
held  fixed  and  only  t2  varies  (through  values  greater  than  t^).  The 
expression  eq.  (3.11)  for  the  conditional  probability  amounts  to  the 
same  as  the  assumption  that  the  universe  of  items  has  been  reduced  from 
U to  (Dj  (cp.  Figure  3.2),  and  that  the  probabilities  have  been  re- 
normalized by  division  by  R<to1)  so  that  the  total  measure  of  Ml  is 
adjusted  to  equal  1. 

The  conditional  survival  density  is  therefore  obtained  by  differ- 
entiating the  numerator  of  eq.  (3.11)  at  t2  - t,  which  yields 


dR(t | ti ) 


R(tx) 


(3.12) 
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The  conditional  probability  of  failure  muat  be  treated  a lightly 
differently  since  In  order  to  fall  during  the  interval  (t^,  t2) » an  Item 
must  first  have  survived  until  tj_.  Therefore!  the  conditional  probabili 
of  failure  prior  to  t2  given  survival  until  t^  ,1s 


F(t2|ti)  - 


F(t2)  - F<ti)  F(t2)  - F(tl) 


1 - F(tx) 


R(ti) 


(3.13) 


the  corresponding  density  at  t2  - t^  ■ t is  usually  called  the  hazard 
rate  (also  often  the  failure  rate)  and  is  expressed  by 


"<*>  ■ MtT 


(3.14) 


By  utilizing  eq.  (3.9)  the  hazard  rate  can  be  expressed  in  terms  of  the 
survival  distribution  as 


" Mt)-  " " dt  logeR^t^  » 

(where  loge  denotes  the  natural  logarithm  function) , and  the  survival 
distribution  is  given  in  terms  of  the  hazard  rate  by 


(where  exp  x » ex) . Formulas  (3.15)  and  (3.16)  are  valid  for  absolutely 
continuous  survival  distributions.  For  discrete  distributions  eq.  (3.15) 
must  be  replaced  by  the  corresponding  finite-difference  formulation. 

Suppose  that  an  item  consists  of  various  parts*  and  survives  only 
if  all  of  its  parts  survive.  If  the  survival  I'.ietribution  of  the  item 
is  R(t)  and  that  of  the  ktfi  part  is  Rjj(t) » and  if  failure  of  the 
various  parts  is  due  to  independent  causes,  then 

R(t)  -/7Rk(t)  • (3.17) 

k ' 
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In  this  case  the  hazard  rate  is 


h(t)  - - 


dt 


|loge/7Rk(t)j 


that  is 


“ ^t  loge^k^)  » 


n(t) 


nk(t) 


(3.18) 


where 


d 


nk(t)  * -^logeRk(t) 


(3.19) 


is  the  hazard  rate  of  the  part.  Thus,  the  hazard  rate  is  additive 

for  independent  causes  of  failure.  This  result  is  valid  for  discrete  as 
well  as  absolutely  continuous  survival  distributions.  This  convenient 
property  permits  Independent  assessment  of  constituent  hazard  rates  and 
provides  a simple  method  for  combining  them,  by  means  of  eqs.  (3.18)  and 
(3.16),  to  recover  R(t)  Itself. 


3. 2 If  R is  a survival  distribution,  then  the  area  under  the  graph  of 
t ** R(t)  is  the  mean  lifetime  of  the  items  £ e U.  In  order  to  adapt  our 
notation  to  items  which  begin  life  at  t » 0,  let  us  suppose  that  R is 
defined  on  [0,  00 ) and  that  R(0) 

Integration  by  parts  yields 


1,  limtR(t)  ■ 0.  The  area  under  the 


graph  is  j£°°R(t)dt. 


/R(t)dt  - tR(t) 
*'0 


V 

0 


tdR 


' -f°CM  '/ 


1 

tdR 
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since  ■ 0 by  hypothesis  and  R varies  from  1 to  0 as  t 

varies  from  0 to  » . Now  recall  from  eq,  (2.9)  that 


sy  ■/ 


tdR 


the  mean  value  of  the  random  variable  t,  thus  the  mean  time  before 
failure.  Graphically  this  result  amounts  to  nothing  more  than  evaluating 
the  area  under  the  graph  of  t *>R(t)  by  integrating  along  the  R-axis 
as  indicated  in  Figure  3.3. 


y “ R(t) 


Figure  3.3.  Calculation  of  Mean  Time  Before  Failure 


' 4.  USEFUL  SURVIVAL  DISTRIBUTIONS 

f 

k The  probability-of-survival  distributions  most  commonly  used  in  the 

practical  analysis  of  reliability  data  are  also  among  those  distributions 
which  have  been  most  intensively  studied  by  theoreticians.  They  are  the 

; 

; 1)  Exponential, 

l - 2)  Normal  (also  called  Gaussian) , 

and  3)  Weibull 

distributions.  In  addition,  the 

4)  Logno rmal 
and  5 ) Gamma 

distributions  have  played  significant  roles.  We  will  define  each  of 
these  and  derive  the  corresponding  density  and  hazard  functions.  Since 
all  of  these  distributions  are  absolutely  continuous,  the  usual  techniques 

of  the  calculus  can  be  employed. 

\ 

It  will  be  assumed  hereafter  that  a survival  distribution  is  defined 
on  some  closed  half-infinite  interval,  which  will  generally  be 

0 1 t < a’  . 

4.1  Exponential  Survival  Distribution 

For  this  distribution  the  probability  of  survival  to  time  t is 

R(t ) = exp  ( - At) , \-0,  tlO  . (4.1) 

Observe  that  £ R(t)  = R(")  = 0 implies  \ ^ 0.  Figure  4.1.1  illus- 
trates the  graph  of  a typical  exponential  distribution. 

The  exponential  survival  density  corresponding  to  eq.  (4.1)  is 

i 

it 

I (|D 

> P(t)  = - -jr  = \ exp  (-Xt)  , (4.2) 

I dt 
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?y*y. 


and  the  hazard  rate  is 


h(t)  = - 


d xogeR(t) 

— ^-—4.  

dt 


= X 


cp  Figures  4.1.2  and  4.1.3. 

Figure  4.2  displays  survival  data  for  the  J65-W-3  jet  engine.  Semi- 
logarithmic  graph  paper  is  used  so  that  the  graph  of  an  exponential 
distribution  appears  as  a straight  line,  tn  this  example  the  data  points 
lie  close  to  the  line  shown:  the  underlying  distribution  can  be  accurately 
approximated  by  an  exponential.  Were  the  exponential  of  the  form  eq. 

(4.1),  then  we  would  find  R(0)  = 1,  but,  the  data  indicates  that  at 
t ■ 0 (that  is,  upon  initial  operation)  approximately  6%  of  the  items 
were  found  to  be  in  a failed  condition.  The  variety  of  potential  mean- 
ings and  definitions  of  the  term  "failed  condition"  have  been  explored 

♦ 

at  length  earlier  in  this  volume.  Regardless  of  the  precise  meaning 
attributed  to  the  term,  the  phenomenon  can  be  Interpreted  as  reflecting 
manufacturing  defects  which  have  escaped  test  procedures  as  well  as 
failures  induced  by  pre-operational  tests  or  aspecvs  of  the  production 
process  itself  which  are  not  detectable  (or  at  least  not  detected)  until 
initial  operation  is  attempted  at  t = 0.  This  phenomenon  is  accommodated 
in  the  mathematical  formalism  by  the  simple  expedient  of  replacing  the 
time  variable  t by  t + tg,  vhere  tg  can  be  thought  of  as  correspond- 
ing to  the  duration  of  pre-operational  exposure  of  the  item.  This  problem 
is  not  confined  to  exponentially  failing  items;  it  is  found  for  all  types 
of  distributions.  The  solution  is  always  the  same:  renormalization  of 
the  zero  time  by  replacement  of  t by  t + tg  for  an  appropriate 
positive  tg.  Thus  the  renormalization  exponential  distribution  (also 
called  "shifted  exponential  distribution)  is 


R(t)  « exp(- X (t  + tg)) 


with  corresponding  density 


p(t)  * X exp(-  X (t  + tg)) 


(4.4) 


(4.5) 


K 
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Age  (in  flying  hours) 


Figure 


SOURCE : United  States  Air  Force,  Procedures  for  Determining 
Aircraft  Engine  (Propulsion  Unit)  Failure  Pates,  Actuarial  Engine 
Life,  and  Forecasting  Monthly  Engine  Changes  by  the  Actuarial 
Method,  Technical  Order , TO  00-25-1 28,  October  20,  1959. 


.2.  TyplenJ  Kxponent Ini  Survtva)  Distribution: 
J65-W-3  Jet  Engine  (semi- Logarithmic.  graph 
paper) 
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and  hazard  rate 


n(t)  * X . (4.6) 

Note  the  important  fact  that  the  hazard  rate  is  independent  of  t^. 

The  exponential  distribution  plays  a special  role  in  the  theory  of 
reliability  for  two  quite  different  reasons.  The  first  is  a practical 
one:  it  has  been  found  that  an  exponential  distribution  characterizes 
the  life  history  of  a variety  of  equipment  types,  generally  including 
electronic  devices  and  complex  equipment.  The  second  reason  is  a 
theoretical  one,  and  in  some  respects  it  is  the  more  basic.  Because  it 
has  a constant  hazard  rate,  the  exponential  distribution  separates  the 
survival  distributions  which  have  an  increasing  hazard  rate  from  those 
which  have  a decreasing  hazard  rate,  and  it  therefore  alsc  separates  two 
completely  different  types  of  maintenance  policies. 

It  is  clear  that  if  an  item  has  a non-increasing  hazard  rate,  then 
there  is  no  advantage  gained  in  replacing  that  item  by  a new  one  at  any 
time  prior  to  its  failure.  Indeed,  if  the  hazard  rate  is  strictly 
decreasing  with  time,  then  replacement  substitutes  an  item  with  a greater 
probability  of  failure  for  the  one  already  in  operation.  If,  however, 
the  hazard  rate  is  strictly  increasing,  then  replacement  of  the  item  by 
a new  one  will  increase  the  probability  of  survival.  In  this  case  the 
main  issue  is  the  cost  of  replacement  maintenance,  and  a principal 
mathematical  problem  is  to  determine  replacement  intervals  which  are 
optimal  with  respect  to  some  mix  of  acceptable  failure  rate  and  mainte- 
nance cost.  The  exponential  distribution  separates  these  fundamentally 
different  classes  of  survival  distributiovis  and  maintenance  policies. 

This  is  specially  fortunate  because  the  exponential  distribution  has 
particularly  simple  mathematical  properties  which  often  make  it  possible 
to  carry  out  technical  analyses  in  complete  and  rigorous  detail,  thereby 
obtaining  lower  or  upper  bounds  for  the  properties  of  general  non-* 
decreasing  or  non-^increaslng  survival  distributions.  This  is  one  main 
reason  why  a large  portion  of  the  reliability  theory  literature  is  de- 
voted to  the  study  of  systems  whose  constituents  have  exponential  distri- 
but  Lons . 
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The  above  remarks  can  be  made  more  precise.  Let  R(t)  be  a survival 
distribution  with  corresponding  hazard  rate  n(t),  and  suppose  that  n 
is  , either  non- increasing  or  non-decreasing  for  0 - t - T.  Then 

n(t)  - n(0)  or  (4.7) 

(4.8) 

respectively. 

(4.9) 


logeP(t)  / n(0)dt  = - n(0)t  (4.10) 

" *0 

for  t £ T,  where  the  upper  (respectively,  lower)  inequality  corresponds 
to  a non-increasing  (respectively,  non-decreasing)  hazard  rate.  Since 
exponentiation  preserves  the  direction  of  an  inequality,  we  find 

R(t)  - exp  (-  n(O)t)  , 0 i t - T (4 . ll) 

if  the  hazard  rate  n(t)  corresponding  to  R(t)  is  non-increasing  for 
0 - t - T,  and 

R(t)  1 exp  (-  n(0)t)  , ' 0 i t S T ^4.12) 

if  n(t),  is  non-decreasing  on  Olt-T,  as  claimed. 

That  an  easily  analyzed  survival  distribution  separates  the  two 
classes  is  an  important  and  useful  fact.  But  there  is  an  unexpected 
bonus:  the  single  narameter  n(0),  which  specifies  the  exponential 

. •*  JD  ■ 

distribution,  is  equal  to  (0)  /R(0)  (even  in  the  time  renormalized 
form  R(t)  - exp  (-  X(t  + to))»  ana  consequently  it  can  be  estimated  from 
data  collected  during  the  early  life  history  'of  the  item.  Therefore,  if 
there  ate  tedaonb  to  believe  that  h(t)  is  either  non- increasing  6r 


n(t)  ^ n(0) 

^ .. 

according  as  n is  non- increasing  or  non-decreasing. 
Hence 

d logeR(t)  <; 

dC = n(t)  > n(0) 

implies 
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non-decreasing,  then  R(t)  can  be  bounded  by  an  exponential  distribution 
whose  parameter  (■  hazard  rate)  can  be  realistically  estimated  * 


This  distribution  is  also  frequently  encountered  in  applications. 

If  the  mean  of  the  normal  distribution  is  positive  and  lajrgein  comparison 
with  the  standard  deviation,  then  truncation  by  restricting  its  domain  to  , 
the  set  of  non-negative  t will  not  result  in  practical  difficulties. 
Otherwise,  the  truncated  distribution  must  be  normalized  to  ensure  that 
R(0)  **  1.  To  do  this,  define  ' ! 


S exp(- i (u^*  )2) 


(4.13) 


Then  the  truncated  normal  survival  distribution  is 


r ***  (-  2 ) ) du- 


t > 0 (4.14) 


where  t*  and  o*  are,  respectively,  the  mean  and  standard  deviation 
of  the  untruncated  normal  distribution.  The  associated  truncated,  normal 


survival  density  function  is 


/.x  dR  1 

p(t>  « - jr  » T7=  e*P 

dt  A o *2i( 


(4.15) 


and  the  truncated  normal  hazard  rate  is 


J exp  (-  t(^)  )d“ 


(4.16) 


Notice  that  the  hazard  rate  id  independent  of  the  truncation  normalization 
factor  A.  these  functions  and  an  application  are  illustrated  in 
Figures  4.3  and  4.4. 


Figure  4.3.  Truncated  Normal  Distribution,  Density,  and  Hazard  Rate; 
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SOURCE : United  States  Air  Force,  Procedures  for  Determining  Aircraft  Engine 
(Propulsion  Unit I Failure  Rates,  Actuarial  Engine  Life,  and  Forecasting  Monthly 
Engine  Changes  by  the  Actuarial  Method,  Technical  Order,  TO  00-25-1 28,  October 
20, 1959. 

NOTE:  it  and  o are  maximum  likelihood  estimates  of  p and  o.  The  Appendix  describes 
the  techniques  used  to  obtain  them.  Using  these  estimates,  a chi-square  goodness-of-fit 
test  was  performed.  The  hypothesis  of  normality  could  not  be  rejected  at  the  20-percent 
significance  level.  Results  differing  from  these  by  less  than  1 percent  were  given  in  a 
curve  fitted  by  the  rules  of  E.  8.  Ferrell,  "Plotting  Experimental  Data  on  Normal  or 
Log-Normal  Probability  Paper,"  Industrial  Quality  Control,  Vol.  15, 1958,  pp.  12-15. 


Figure  4.4.  Truncated  Normal  Survival  Distribution:  J57-F-59 
and  J57-P59  Jet  Engines  (Normal  probability  graph 
paper) 
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The  Weibull  distribution  was  introduced  in  1951  by  the  Swedish 
statistician  Waloddi  Weibull  in  order  to  describe  the  tensile  strength 
of  steel  [14].  It  has  since  been  applied  to  a variety  of  reliability 
problems.  The  Weibull  survival  distribution  is  defined  on  0 - t < * 
an 4 assumes  the  form 


R(t)  - exp  (-  Xt8)  , X > 0 , s > 0 . 

The  Corresponding  Weibull  survival  density  function  is 


p (t)  - - ^ = Xsts-Xexp  (-  Xts)  , 


and  the  Weibull  hazard  rate  takes  the  form 


n(t)  - Xst8-1  . 


(4.17 


(4.18) 


(4.19) 


Observe  that  if  s - 1,  then  the  Weibull  distribution  reduces  to  the 

> 

exponential  distribution  with  parameter  X.  The  hazard  rate  is  increas- 
ing if  s > 1 and  decreasing  if  0 < s < 1.  One  can  think  of  the 
Weibull  hazard  function  as  the  best  power-function  approximation  to  an 
arbitrary  i ontinuous  hazard  rate  in  a neighborhood  of  t * 0. 

rc*>  of  th  eibull  distribution,  density,  and  hazard  rate  and  an 
appll;  -rion  appear  in  Figures  4.5  and  4.6. 


Figure  4.6.  Weibull  Survival  Distribution:  J47-GE-27  Jet  Engines  (Log-log  graph  paper) 


The  lognormal  survival  diatribution  appears  to  be  finding  increasing 
favor  as  a candidate  for  the  description  of  survival  data.  It  has  been 
applied  to  the  description  of  crack  growth  as  a function  of  time  in 
primary  aircraft  structures  [3]  and  to  jet  engine  compressor  bleed  control 
data  (cp.  Figure  4.8). 


The  lognormal  survival  distribution  is 


R(t) ' 7k  (*?( 


1 / logeU  - Xogg 


lYh 


(4.20) 


where  0 < t , loget  is  the  mean  of  loget , and  o is  the  standard 
deviation.  The  corresponding  lognormal  survival  density  is 


<c)  -zk- — 


loget  - log.t  \ 2 


n • 


and  the  lognormal  hazard  rate  is 


(4.21) 


(4.22) 


Graphs  of  these  functions  are  displayed  in  Figure  4.7^miMyjiyi';^.8 
exhibits  an  application  to  observations.  Notice  tha*|9|KjI||SaB  rate 
(Figure  4.7.4)  increases  at  first,  attains  a maximum  decreases 

thereafter.  While  this  behavior  is  not  often  observed,  th<|  eVaiaple 
illustrated'  in  Figure  4.8  suggests  that  the  lognormal  distribution  may 
be  appropriate  for  some  special  types  of  aviation  failure  phenomena.  *** 


Lk  -GftMaa  Survival  Distribution 

This  distribution  generalizes  both  the  lognormal  and  the  exponential 
distributions.  The  gamma  survival  distribution  is 


R(t)  - TJZ)  fx  e"Uu8‘ldu  » a > 0 , X > 0 

where  t > 0 and 


(4.23) 


(A. 24) 


is  Euler's  gamma  function.  If  s - 1,  then  the  gamma  distribution 
reduces  to  the  exponential  distribution  R(t)  - exp  (-  Xt). 

If  s ■ j , then  the  substitution  of  variables 


u 


logev  - u 
o 


) 


2 


(4.25) 


transforms  the  gamma  distribution  Eq.  (4.23)  into  a lognormal  distribution 
relative  to  a pseudo  time  variable  x ■ e^1"*" a *^2Xt"  > xhe  gamma  survival 
density  is 


p(t) 


xV1 


TGT  exp  (“  xt)  * 


(4.26) 


and  the  gamma  hazard  rate  is  given  by 


n(t) 


XSts  ^exp  (-  Xt) 


/CD 

-u  s— 1 
e u 

At 


(4.27) 


du 


1 


Their  graphs  and  an  application  are  displayed  in  Figures  4.9  and  4.10. 
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S.  SIMPLE  AND  COMPLEX  SYSTEMS 

jlul.  The  statistical  study  of  reliability  had  its  origin  in  demography, 
and  its  terminology  reflects  this  history.  The  survival  distribution, 
which  specifies  the  probability  that  an  individual  belonging  to  a homo- 
geneous population  will  survive  until  time  t , yields  a hazard  function 
which,  as  time  increases  from  birth  until  death,  initially  decreases 
from  large  values  during  an  interval  of  infant  mortality,  remains  rel- 
atively constant  for  some  time,  and  then,  as  the  wear-out  interval  of 
old  age  is  attained,  once  again  increases.  Thus  the  graph  of  the  hazard 
function  is  a shallow  U-shaped  curve,  frequently  called  the  "bathtub 
curve"  in  reliability  literature;  cp.  Figure  5.1. 


Figv^s  5.1.  "Bathtub"  Hazard  Function 

r , 

The  .reader  will  have  noticed  that  none  of  the  standard  reliability 
distributions  described  in  Section  4 gives  rise  to  a hazard  function 
whose  graph  is  U-shaped.  For  instancy,  the  Weibull  distribution  cor-, 
responds  to  a hazard  function  of  the  form  (cp.  eq  (4.19)) 

n(t)  - 1st8"1  , A > 0 , 8 > 0 • , (5.1) 

‘ ’ * ' • ' I ■ . 
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which  increases  with  increasing  time  if  s > 1,  is  constant  if  s * 1, 
and  decreases  with  increasing  time  if  s < 1.  The  hazard  function  for 
the  lognormal  survival  distribution  Increases  to  a maximum  and  then 
decreases  as  time  increases.  Such  functions  can  be  used  to  describe 
the  infant  mortality  regime  of  a hazard  function,  or  the  wear-out  regime, j 
but  not  both.  This  remark  has  an  important  consequence:  if  an  item 
displays  infant  mortality  characteristics,  that  is,  if  n(t)  is  a de- 
creasing function  for  smali  positive  t,  then  n(t)  can  only  be  repre- 
sented by  one  of  the  standard  distributions  (such  as  those  described  in 
Section  4)  for  epochs  much  earlier  than  typical  wear-out  epochs,  since 
the  infant  mortality  data  is  necessarily  acquired  first.  There  can  be 
no  solely  mathematical  method  for  gaining  information  about  wear-out 
characteristics  from  data  which  Includes  infant  mortalities. 

The  reason  for  this  state  of  affairs  is  plain  in  human  mortality 
characteristics.  Although  the  statistical  propertied  of  infant  mortality 
and  wear-out  at  old  age  are  separately  highly  regular  and  susceptible  to 
statistical  analysis,  their  causes  and  corresponding  hazard  functions 
are  very  different.  There  in  no  reason  to  believe  that  any  one  math- 
ematically simple  statistical  distribution  can  be  related  to  the  under- 
lying physical  phenomena  which  correspond  to  both  extremes.  It  thus 
becomes  necessary  to  think  of  the  U-shaped  hazard  function  as  the  sum 
of  (at  least)  two  Independent  functions, 

^ n(t)  - nQ(t)  + hw(t)  , (5.2) 

where  Hg(t)  describes  the  hazard  due  to  infant  mortality  and  nM(t) 
describes  that  due  to  wear.  One  could  argue  for  including  a third 
hazard  function  to  describe  hazard  at  Intermediate  ages,  as  is ’done 
in  demographic  analysis,  but  this  will  not  be  necessary  for  our  present 
purpose. 

With  n decomposed  as  in  eq  (5.2)  above,  and  supposing  that  both 
infant  mortality  and  wear-out  are  present  for  the  items  under  consider- 
ation, we  may  think  of  rig  an  a decreasing  function  which  tends  to  a 
limit  kg,  0 < kg  “ lim  hg(t).  It  is  clear  that  without  further 
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detailed  information  about  the  (physical)  characteristics:  of  the  item 
under  consideration,  analysis  of  early  failure  data  cannot  lead  to  any 
conclusions  about  wear-out:  if  infant  mortality  persists  for  a significant 
period  of  time.  , 

Reliability  theoreticians  are  consequently  constrained  to  study 
specific  systems  for  which  it  is  possible,  on  physical  or  other  grounds, 
to  determine  Hg  and  independently,  or  to  study  systems  for  which 
either  infant  mortality  or  wear-out  (or  both)  are  negligible.  They  are 
faced  with  an  additional  difficulty.  Complex  systems  whose  constituents 
follow  various  distinct  survival  distributions,  or  the  same  distribution 
with  a variety  of  parameter  values,  are  not  amenable  to  rigorous  analysis. 

For  these  reasons, the  bulk  of  the  theoretical  literature  concerned  with 

\ ' 

reliability  is  devoted  co  simple  (one-celled)  items  for  which  the  hazard 
function  is  assumed  either  non-decreasing  (wear-out)  or  non-increasing 
(infant  mortality) — the  constant  hazard  function,  corresponding  to  the 
exponential  survival  distribution,  is  a special  case  of  both — and  to 
configurations  of  identical  or  closely  related  simple  items  which  possess 
special  symmetries,  e.g.,  series-  or  parallel-connected  simple  items. 

With  these  constraints  it  may  be  possible  to  derive  optimal  maintenance 
policies  if  the  family  of  policies  considered  is  sufficiently  structured. 
Perhaps  the  most  popular  structural  policy  constraint  is  maintenance 
periodicity. 

\ ■ 

Many  simple  items  exhibit  wear.  If  replicas  of  s1^£i  an  item  are 
expected  to  be  in  service  at  a future  date  significantly  greater  than 
the  lifetime  of  an  individual  item  and  if  single  items  are  producable 
at  low  cost  and  in  great  number,  then  age  exploration,  or  life  testing, 
will  establish  the  hazard  function  from  observations  and  thereby  iden- 
tify it  as  a standard  hazard  function,  amenable  to  theoretical  study,  if 
it  happens  to  be  one.  If  an  analytical  expression  is  not  known,  an 
approximation  can  be  obtained  (e.g.,  following  the  prescription  given 
in  13}) , or  numerical  methods  can  be  used  to  carry  out  the  computations 
called  for  by  theoretical  analyses.  In  this  case  there  is  no  problem 
in  principle  in  applying  standard  methods  of  reliability  theory. 
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If,  hbwtever,  the  expected  opefatibnal  lifetime  prior  to  obsolescence 
of  the  type  of  item  is  comparable  with  the  expected  lifetime  of  *n  indi- 
vidual item  of  that  type,  then,  unless  decelerated  testing  is  possible, 
there  will  be  no  time  for  age  exploration;  wear  characteristics  must  be 
derived  from  some  more  basic,  usually  physical,  argument,  or  hypothesized 
baaed  on  related  prior  experience,  or  analyses  founded  on  explicit  knowl- 

edge  of  the  hazard  function  must  be  forgone. 

. ’ " ' • ■■  ■ ' ' : * 

b:  • ..tv  . show  that  there  are  ^important  categories  of  items  for 
which  ? survival  distribution  is  a standard  distribution,  and  the  pa- 
rse:', ter  values  can  be  estimated  from  actuarial  analysis.  An  extensive 
analysis  of  survival  distributions  was  reported  by  D.  J.  Davis  [1]. 

Among  his  findings  were  that  the  exponential  survival  distribution  was 
characteristic  of  such  devices  as 

e commercial  aircraft  radio  tubes, 
s Linotype  machines 

s automated  mechanical  calculating  machines 
s ball  bearings 

All  but  the  last  are  now  obsolete.  It  has  since  been  reported  that  most 
electronic  systems  and  most  'complicated'  systems  also  fall  into  this 
category.  Aircraft  engines,  however,  usually  exhibit  some  degree  of 
wear-out,  i.e.,  their  hazard  function  ultimately  increases  with  time 
(cp.  Figures  4.4,  4.6,  and  4.10,  but  also  Figure  4.2). 

Typical  studies  of,  preventive  maintenance  policies  for  simple  sys- 

* ' ' ' • • ‘ y 

terns  assume  that  the  actual  state  of  the  item  is  known  at  all  times 

i,,  'w  ; - ' . v 

prior,  to  failure,  including  the  associated  survival  distribution.  The 
time  of  failure  of  the  item  is  the  only  unknown.  Moreov^jf,  typical 
i maintenance  actions  are  restricted  to  replacement  of  a given  item  by 
an  identical  zero  - timed  item,  thus  'renewing'  the  system  of,  which  the 
item  is  a constituent.  Generally,  the  problem  treated  is  determination 
of  the  time  of  replacement  (renewal)  to  minimize  cost  or  t^eet  a numer- 
ically expressed  safety  requirement,  or  to  introduce  redundancy  (i.e., 
create  a symmetrically  interconnected  collection  of  replicas  of  the 
simple  item  to  form  a simple  system)  in  order  to  reduce  the  failure 
rate. . 
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When  Che  items  which  constitute  a system  ere  essentially  identical 
and  are  interconnected  in  a synsnetrical  way  (e.  g« , series— or  parallel- 
interconnection)  and  when  the  survival  distribution  corresponding  to  , 
each  individual  item  is  known,  then  it  may  be  possible  to  perform  a 
complete  mathematical  analysis  of  the  reliability  of  the  system.  Sys- 
tems for  which  one  or  more  of  these  assumptions  are  invalid  can  be:  called 
complex  systems.  This  definition  differs  in  an  inessential., way  from  that 
given  in  Chapter  4,  Section  2.  The  combined  vehicle-  and  earth-based 
control  systems  for  the  Apollo  and  Viking  projects  are  example?  of  one- 

I 

time  complex  systems  for  which  neither  complete  age  exploration  nor,  j . 
accelerated  testing  to  determine  survival  characteristics  was  possible. 
This  deficiency  was  compensated,  to  some  extent,  by  the  extensive  uae 
of  redundancy.  Nevertheless,  it  is  clear  that  a complete  mathematical 
reliability  analysis  for  such  a system  is  out  of  the  question. 

* • i 

Commercial  and  military  aircraft  are  examples  of  complex  systems 
about  which  much  more  can  be  learned  through  testing,  age  exploration, 
and  experience  because  there  are,  relatively,  so  many  more  of  them  and, 
ultimately,  they  are  in  operation  for  a long  period  of  time.  But  for 
them  also  a complete  mathematical  analysis  is  out  of  the  question  be- 
cause of  the  large  number  of  diverse  items,  each  with  its  own  survival 
characteristics,  and  the  complex  and  irregular  interconnections  and 
multiple  uses  and  paths  which  have  been  designed  into  modem  aircraft, 
or  are  unintentional  consequences  of  the  design,  bforeover,  aircraft 
are  modified  as  time  passes  to  incorporate  new  developments  in  assembly 
and  subsystem  design,  and  maintenance  activities  quickly  ensure  that 
the  ages  of  various  subsystems,  both  majoj*  and  minor,  bear  little  rela- 
tionship to  the  nameplate  age  of  the  airframe. 

Just  as  t;he  (classical)  properties  of  a gas  cannot,  in  practice,  be 
derived  from  knowledge  of  Newton's  equations  although  the  latter  suffice 
in  principle  for  the'  task,  so  too  the  survival  characteristics  of  a 
complex  system  could  not  be  obtained  in  practice  even  if  complete  knowl- 
edge of  the  survival  characteristics  of  its  constituent  parte  as  well  as 
the  details  of  their  interconnection  were  available.  An  alternative 
method  is  needed,  ’less  sensitive  to  the^  'microscopic'  structure  of  the 


complex  system  And  therefore  necessarily  of  Insufficient  power  to  treat 
ell  conceivable  questions , but  powerful* enough  nevertheless  to -guide  • 
the  formulation  of  maintenance  policy.  To  continue  this  simile,  the  *•  * 
relationship  of  a method  for  analysis  of  complex  systems  to  the  tradi- 
tional method  for  analysis  of  simple  systems  can  be  likened  to  the 
relationship  between  statistical  mechanics  and  hewtonian  mechanics: 
detailed  knowledge  about  individual  items  and  their  interconnection 
will > in  general,  not  play  an  explicit  role,  but  the  method  will  pro- 
vide the  decisive  information  v;-,'  h is  used  to  formulate  answers  to  the 
basic  maintenance  policy  questions. 

The  Reliability-Centered  Maintenance  Program  [6]  described  in  this 

' . ■>  , V 

volume  is  a general  method  of  designing  maintenance  policies  for  complex 

# 

systems  which  requires  very  little  explicit  'microscopic'  knowledge  of 

survival  distributions  and  interconnections  for  the  tens  of  thousands 

/ 

of  constituents  of  a commercial  aircraft.  The  next  Section  is  devoted 
to  a mathematical  description  of  the  structure  of  this  Program. 


6.  RELIABILITY -CENTERED  MAINTENANCE 

6.1'  The  principal  goa3  of  a maintenance  system  is  to  ensure  the  highest 
practical  standard  of  operating  performance  of  the  equipment  being  main- 
tained. Criteria  of  operating  performance  are,  however,  quite  varied, 
depending  simultaneously  upon  the  cost  of  maintenance  and  the  consequences 
of  failure.  For  circumstances  where  the  consequences  of  failure  are 
relatively  minor  it  will  generally  be  sufficient  to  focus  on  the  relia- 
bility of  the  constituent  items  of  the  system,  and  to  learn  from 

experience  as  well  as  from  testing  whether  component  redesign  is  necessary 

•? 

and  which  maintenance  policies  are  cost-effective.  As  such  information 
accumulates,  maintenance  policies  and  system  design  evolve  together  to 
improve  operating  reliability. 

Those  systems  for  which  the  consequences  of  failure  are  3erious, 
such  as  commercial  aircraft,  nuclear  reactors,  and  military  missile 
systems,  must  be  considered  from  a different  point  of  view.  In  each  of 
these  instances,  the  consequences  of  certain  failures  are  unacceptable. 
Critical  failures  in  the  sense  of  Chapter  3.2  belong  to  this  category. 

It  will  be  convenient  to  refer  to  any  unacceptable  failure  as  a critical 
failure.  The  criteria  of  unacceptability  may  be  quite  complicated  in  any 
specific  instance,  although  certain  types  of  failure  will  normally  be 
clearly  unacceptable,  j'o r example,  a failure  in  a military  missile  which 
destroyed  its  ability  to  complete  iti  mission  would  be  unacceptable,  as 
would  a failure  of  a nuclear  power  reactor  situated  in  a densely  populated 
region  which  could  lead  to  an  explosion. 

In  situations  such  as  these  there  is  the  temptation  to  avoid  failure 
"at  all  costs,"  but,  since  there  are  always  practical  limitations  to  the 
resources  which  can  be  brought  to  bear  on  any  single  problem,  and  also 
because  in  certain  complicated  circumstances  it  is  not  possible  to  obtain 
all  of  the  potentially  valuable  information  which  would  in  principle  be 
necessary  to  avoid  failure,  the  attempt  to  avoid  critical  failures  must 
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inherently  be  a compromise  between  the  imputed  cost  of  the  failure  and 
the  cost  of  procedures  that  would  decrease  the  probability  of  failure. 

For  complex  systems  such  as  commercial  aircraft,  it  would  be  pro- 
hibitively costly  to  devote  serious  and  scheduled  maintenance  to  each  of 
its  tens  of  thousands  of  parts.  But  of  greater  importance  is  the  obser- 
vation that  intensive  scheduled  maintenance  (he  it  "Hard  time"  or  "On 
Condition;"  cp.  Chapter  5),  regardless  of  cost,  will  not  necessarily 
reduce  the  probability  of  critical  failures.  This  suggests  that  the 
constituent  items  of  a system  should  be  analyzed  with  regard  to  the 
consequences  of  their  failure  rather  than  merely  with  regard  to  their 
reliability.  If  the  consequences  of  failure  are  acceptable,  then,  in 
the  absence  of  some  other  reason  unrelated  to  criticality  of  failure, 
the  maintenance  policy  designer  need  not  and  should  not  devote  resources 
to  scheduled  maintenance  of  the  item.  The  recognition  of  the  importance 
of  the  functional  role  and  consequences  of  failure  of  an  item  are  basic 
principles  of  the  Reliability-Centered  Maintenance  Program;  cp.  the 
extensive  discussion  in  Chapter  3.  Its  main  practical  consequence  in  the 
case  of  commercial  aircraft  is  that,  of  the  tens  of  thousands  of  items 
which  are  part  of  an  aircraft,  only  several  hundred  participate  in 
critical  failures  and  therefore  the  latter  are  the  only  candidates  for 
scheduled  maintenance  procedures. 

tt  may  turn  out  that  an  Item  participates  In  critical  failures  but 
cannot  benefit  from  scheduled  maintenance.  There  may  not  be  any  way  to 
defect  reduced  resistance  to  failure.  One  resolution  of  this  dilemma  is 
to  redesign  the  item  to  avoid  part lelpat Ion  in  c&Ltica^f allures  or  so 
that  reduced  resistance  to  failure  can  he  detect^  by  scheduled  maintenance 
operations.  The  latter  solution  is  an  instance  of  another  important 
principle  of  the  Reliability-Centered  Maintenance  Program:  items  which 
participate  in  critical  failures  should  be  replaced  by  items  which  con- 
vert critical  failures  to  ndn-crltlcai  failures  or  to  a mode  of  reduced 
resistance  to  failure  which  can  be  detected  by  scheduled  maintenance 
operat tone . One  consequence  of  thjs  policy  Is  that  it  may  lead  to  an 
increase  in  the  number  of  failures  or  equipment  replacements,  thereby 
increasing  maintenance  costs;  but,  by  reducing  the  probability  of  or it  leal 


the  imputed  large  costs  .of  critical  failures.  Thus,  application  of  tha 


j Reliability-Centered  Maintenance  Program  simultaneously 

• Reduces  the  probability  of  critical  failure; 

- s Reduces  maintenance  costs  by  reducing  the  number  of  items 

i considered  for  scheduled  maintenance;  < 

j • Increases  maintenance  costs  by  replacement  of  items  whose 

I reduced  resistance  to  failure  is  unobservable  by  items  whose 

[ reduced  resistance  is  detectable  by  scheduled  maintenance,  or 

| whose  failure  is  non-critical. 


The  remainder  of  this  Section  provides  a mathematical  formulation 
of  the  preceding  ideas.  There  are  three  main  mathematical  aspects.  The 
first  corresponds  to  the  partition  of  the  system  into  sets  of  items  that 
are  functionally  related  by  means  of  the  consequences  of  their  failure 
(cp.  Chapter  7).  The  second  is  the  formal  expression  of  the  costs  of 
maintenance  and  consequences  of  failure  in  common  terms  of  direct  and 
imputed  costs.  This  maintenance/f allure  cost  function  is  really  the  main 
object  of  study.  The  principal  purpose  of  the  maintenance  policy 
designer  is  to  minimize  the  maintenance/failure  cost  function.  The  third 
mathematical  aspect  models  the  iterative  procedure  used  in  the  Reliability- 
Centered  Maintenance  Program  to  minimize  the  total  cost  function.  The 
Decision  Diagram  approach  of  Chapter  8 is  the  main  component  of  this  part 
of  the  Program.  ' 

6. 2,  Every  complex  system  is  composed  of  many  individual  parts  or  items . 
These  constituents  are  not  necessarily  in  one-to-one  correspondence  with 
functions  performed  by  the  system.  Most  physically  distinct  parts  per- 
form no  function  at  all  in  isolation;  some  may  be  oannily  designed  to 
participate  in  the  performance  of  several  distinct  functions  (as  an  air- 
liner seat  cushion  Is  also  a flotation  device).  Thus  it  is  impossible 
to  identify  parts  with  functions  or  roles,  and  it  may  not  even  be  possible 
to  obtain  complete  agreement  upon  what  constitutes  the  set  of  elementary, 
or  irreducible,  items  of  a complex  system.  , We  will  assume  Lhat  some 
choice  has  been  made.  The  volume  titled  Reliability-Centered  Mainte- 
nance [7]  provides  a detailed  description  of  one  procedure  that  can  be 
followed  to  make  this  selection  for  commercial  aircraft. 
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Let  | denote  the  set  of  items  of  some  complex  system  end  let  s 
denoted  typical  item  belonging  to  S.  Items  of  s given,  type  may  occur 
more  than^nce  in  the  system;  each  occurrence  is  represented  by  a distinct 
element  of  S.  We  may  think  of  the  items  which  constitute  the  system  as 
represented  by  points,  and  of  S as  the  set  of  those  points;  this 
Interpretation  is  used  in  Figure  6.1 


Figure  6.1.  Set  of  Items  of  a Complex  System 

To  each  s e S there  corresponds  an  associated  survival  distribution 
t Rg(t),  where  we  suppose  that  some  satisfactory  definition  of  failure 
for  s has  been  selected.  The  reader  should  recall  the  extensive  dis- 
cussion of  this  difficult  problem  in  Chapter  3.  With  an  appropriate 
definition  of  failure  for  the  system  S itself,  let  Rs(t)  denote  the 
survival  distribution  for  S.  If  Rg(t)  could  be  readily  expressed  in 
terms  of  the  Rs(t),  s c S,  then  the  problem  of  maintenance  policy  design 
would  be  reduced  to  the  establishment  of  a maintenance  procedure  for  each 
8 « § which  ensures  that  Rg(t)  > k (where  k is  a given  minimal 
acceptable  system  reliability)  and,  subject  to  that  constraint,  costs 
least  to  implement.  In  other  words,  programs  developed  using  the  tech- 
niques of  reliability-centered  maintenance  tend  towards  minimizing  all 
costs  that  are  a function  of  scheduled  maintenance. 


But  Rg(t)  cannot  be  explicitly  expressed  in  terms  of  the  R,(t) 
for  complex  systems  consisting  of  numerous  parts.  The  set  |Rs(t)  :s«.|| 
of  survival  distributions  does  not  contain  all  the  information  necessary 
for  the  analytical  solution  of  the  problem  because  the  components  s of 
the  system  are  in  general  interconnected  and,  therefore,  at  least  some 
of  the  survival  distributions  Rs(t)  are  not  independent.  Suppose,  for 
the  moment,  that  the  probability  of  survival  of  each  seS  were 
Independent  of  the  probability  of  survival  of  the  remaining  items.  Then 

RS(t)  ■ S?s  VC>  • «•» 

m m 


and  this  relationship  would  enable  one  directly  to  reduce  all  questions 
about  system  survival  to  questions  about  the  survival  characteristics  of 
the  elementary  items,  ignoring  their  interconnection.  Since  the  Rs(t) 
are,  in  general,  dependent,  we  have  the  choice  of  studying  the  inter- 
connection of  the  items  or  avoiding  consideration  of  elementary  items 
altogether.  The  first  alternative  is  typical  of  the  standard  methods 
reported  in  the  literature.  The  second  alternative  has  received  much 
less  attention  (a  recent  Jna.ysis  which  adopts  this  viewpoint  is  reported 

jundation  of  the  Reliability-Centered  Mainte- 
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We  need  some  terminology.  If  S is  any  set,  then  a partition  of 
is  a collection  of  subsets  A such  that 


u * 

A c A 


(6.2.1) 


and,  if  A e A,  X*  e A , then 

A i A'  implies  AH  A*  * 0 ; (6.2.2) 

eq.  (6.2.1)  asserts  that  the  subsets  X exhaust  S,  and  eq.  (6.2.2) 
states  that  no  two  of  the  subsets  overlap.  This  situation  is  represented 
in  Figure  6.2.  A partition  M of  S is  said  to  be  a refinement  of 
the  partition  A if  each  y € M is  contained  in  some  X < A.  Thus,  the 
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Nov  suppose  that  | is  the  set  of  eletmsntiary /items  of  a complex; 


system  and  that  A. 


is  a pert it ion  of  S . 


Just  an  tho  su 


distribution 

' If,  r hVi.  ' 


Rs(t)  Ja  &c*ociat£d  Vith  the  system  itself,  so  too  can  a survival 
distribution  be  kesoc  ie, ted  with,  each  set  X'  belonging  to' the 

partition*  Each  X , is  a collection  of  items,  hut  R^Vt)  will  not  in 
general  be  the  product  o£  :thfe  survival  distributions', of  the  constituent 
ithma  seX  because  of  th£ir  interconnections.  Nevertheless  there  may 
be  soiue  partitions  A for  which  the  survival  distributions  assume  a 
particularly  convenient  analytical  form  or,  even,  if  they  cannot  be,  >.  <. 

" v ' 1 1 ■ 

explifclj~j.y  identified,  have  particularly  convenient  properties. 

* , 1 . ' , 

One  purpose  of  the  decomposition  and  partition  procedures  discussed 

in  Chapters  2 and  7 is  to  define  a convenient  partition  of  the  set  of 
parts  of  an  airliner.  The  method  described  is  applicable,  in  principle,' 
to  any  complex  system.  Various  parts  are  amalgamated  by  their  inter- 
connections and  functional  interdependence  into  components,  subassemblies, 
assemblies,  and  subsystems.  Each  of  these  is  a natural  candidate  for  an 
element  in  a partition  of  S.  If,  for  example,  a partition  A contains 
some  Subsystem  X,  then  the  subassemblies  which  constitute  X,  together 
with  the  other  elements  of  A,  define  a refinement  of  A. 


Let  us  suppose  that 
functions  Rx(t)  and 
X,  X'eA,  are  independent.  Then 


A is  a partition  of  S 3uch  that  the  survival 
^i(t)  of  any  pair  of  distinct  elements  \ f X* , 


Rs(t) 


n RX(t) 


X e A 


(6.3) 


A partition  enjoying  this  property  always  exists,  because  the  coarsest 
partition,  which  consists  of  the  single  set  S itself,  has  this  property. 

In  principle,  most  is  known  about  the  survival  characteristics  of  the 
elementary  parts  from  which  S is  ultimately  constructed,  and  progvM* 
sively  less  Is  known  about  increasingly  complex  kmal gamations  of  the 
elementary  parts.  Therefore  we  seek  that  compromise  partition  whose 
constituent  subsets  are  as  simple  as  possible,  l.e.,  as  close  to  the 
elementary  parts  as  possible,  while  still  tetsinfhg  the  property  that 
the  survival  distributions  of  the  elements,  of  the  partition  are  inkepehdetit , 
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SO.  that  <iq.  (6,3)  remains*  vilid.  . That  is,  among  ail  partitions  of  § 
for  yhich  eq.  (6.3)  holds,  we  seek  a partition  such  that  if  M is  any 
refinement  of;, A,  then  eq.  (,C.3)  does  not  hold  for  M.  , A partition  A 
which  has  this  property  will  be  said  to  he  maximally  independent.  It  is 
clear  that  maximally  independent  partitions  exist  but  are  not  necessarily 
udique;  that  is.,  there  may  be  more  than  one  way  to  select  a maximally 

I I . ’ ‘ 

independent  partition.  < ■ . , ■ , 


\ 


i 

i 

i 

) 

( 
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It  is  intuitively  clear  that  a complex  system  _,uoh  as  an  airliner 
can  be  partitioned  into  Independent  (or  ct  least  very  nearly  independent)  ' 
subsystems  according  to  this  prescription.  For  instance,  apart  frdtt  a 
common  interdependence  on  the  powerr-plant  as  an  energy  source,  the  sub- 
system consisting  of  the  collection  of  passenger  reading  lights  is  > 
independent  of  the  cabin  pressurization  subsystem,  the  landing  gear 
assembly  is  independent  of  the  flight  control  surfaces  subsystem,  and  so 
forth. 


Hereafter  we  will  assume  that  some  maximally  independent  partition 
A has  been  selected.  The  next  task  is  to  associate  a cost  function  with 
this  partition. 


6.3,  Let  C\(t)  denote  the  sum  of  the  expected  cost  of  maintenance  and 
imputed  cost  of  failure  of  the  partition  element  X-e  A as  a function  of 
time  t.  Cx(t)  includes  the  cost  of  Hard  Time'  replacements of  On 
Condition  inspections  and  replacements,  of  warehousing  and  distribution 
of  replacement  items,  and  all  other  costs  attributable  to  the  maintenance 
function.  It  also  includes  the  imputed  cost  of  failure  of  X.  For  some 
partition  elements  the  co>-t  of  failure  is  negligibly  greater  than  the 
cost  of  renewal  of  X.  For  instance,  failure,  of  the  in-flight  motion 
picture  system,  a fairly  frequent  oceurrenpe  on  some  airlines,  is  at 
worst  an  irritation  which  may  influence  passenger  preference  in  a minor 
way  and  thereby  affect  future  passenger  load  factors  to  some  slight  degree. 
Failure  of  other,  safety-related,  partition  elements  can  entail  costs 
far  greater  than  the  cost  of  renewal  of  X.  If  failure  aborts  the  mission 
or,  in  the  rpse  of  airliners,  causes  ],oss  of  life  and/or  lose  of  the  entire 
system,  theh  the  imputed  costs  of  the  f«  lure  constitute  the  principal 
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driving  force  beiiitid  the  design  of  the  tMlfitdnanca  polity,  Cx'(t)  ; , 

includes  such  cipets.  , 

■ • f i / , v , 1 ‘ ~\  , • .1  1 

Certain  costs  are 'not  included  Jin  Cx (t  j in  what  follows,  although 
they  nrtght  find  their  pla<v  in  a mejje  comprehensive  treatment  of  our 
subject.  In  the  caoe  of  commercial!  airliners , revenue-producing  costs, 

including  advertising  and  non-maintenance  personnel  expenses,  are  excluded 

• ' ’ 1 . • . . ‘ 

flow  Cx(t). 

if 

WitMbut  loss  of  generality  we  may  suppose  that  Cx(t),is  the  sum  of 
an  absolutely  continuous  function  (which  represents,  in  part , .the  imputed 
c-Oat  of  failure)  and  a discrete  part  (which  Includes  the  costs'  of 

i • • 

periodic  maintenance  and  renewal);  recall,  the  definitions  given  in 
Section  2.  It  follows  that  the  cost  function  C-x(t)  possesses  a corres- 
ponding co3t  density  (generalized)  function  cx(t)  (recall  ep.  (2,30) 
and  the  related  discussion  of  the  density  associated  with  a discrete 
distribution).  Then 

p t 

Cx (t)  - / cx<t)dt  . (6.4) 

J0 

If  X never  fails,  then  C\(t)  essentially  reduces  to  the  cost  of 
Hard  Time  and  On  Condition  maintenance  through  time  t,  and  can  be 
approximated  bv  a step  function.  If  the  probability  of  failure  Is  not 
zero,  then  it  will  be  more  useful  to  express  the  maintenance/failure  cost 
CA(t)  in  terms  of  the  failure  distribution  Fx(t).  If  X is  maintained 
in  accordance  with  a Hard  Time  policy  without  inspection,  then  the 
associated  renewal  cost  density  will  be  proportional  to  the  number  of 
items  which  survive  until  che  replacement  time.  If  that  time  is  tj^, 
then  the  corresponding  cost  density  is  proportional  to  & (t  ~ t^)  R (t) , 
where  6(t  - 1^)  Is  the  Dirac  delta  function  (cp.  Eq.  (2.27)).  If  X 
is  maintained  in  accordance  with  an  On  Condition  poli:y,  then  costs  will 
be  incurred  at  every  inspection.,  If  inspection  times  are  tj,* 
then  the  cost  density  will  be  of  the  forat 

, , Ht-tjRjt)  i (6.5) 
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her#  cf  ,(t)  1 dSiadi*'*r  the 'fco«s\'ftf  maintenance  of  X at  time  t4v 

Finally,  it  X is  maintained  in  accordance  with  a Cot^itjon  Monitoring 

• i ‘ /;.•  1 . ; V- 1'..  t ' • ‘ * ■ 1 

process,  or  if  X actually  fa^Ia,..  $en  the  Corresponding  cost  density 
will  be  proportional  to  che  fax-lurti  density,.  P\\t)  ” - . It  follows 

that  the  general  cost  density  has  the  form  , 

1 ? ’ • * * ‘ * ) ' i ■ 
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cx(t)  « c*(t)  Px(t)  + ]^)c*  ^t)  «<t;-tp  SAU)  (6,6) 

The  imputed  costs  of  failure  are  represented  by  cx(t).  £q.  (6.6)  shown  /~- 
that  failure  cost  density  is  proportional  to>  the  failure  density  and  j, 
that  other  maintenance  costs  are  proportional  to  the  iurvival  distribution. 
In  order  to  make  these  expressions  comparable,  we  will  express  the 
survival  distribution  in  terms  of  the  hazard  rate  and  the  failure  density. 
From  eq.  (3.14)  we  have  Rx(t)  » Px(t)/n\(t)  so 


cx(e) 


{ 


c*(t)  ■ + 


£i<«> 

V‘> 


} px<t> 


(6.7) 


def  Yx(t)  p^(t) 


where  we  have  written 

\ 

i. 


Tx(t) 


V°  + 


?■ 


°!.i <«>  «<«-v 


nx(t) 


If  the  frequence  of  inspections  is  large  compared  with  fhe  frequency 
of  failures,  then  the  delta  functions  can  be  approximated  by  linear 
interpolation.  This  amounts  to  the  assumption  that  inspection  costs  are 
expanded  uniformly  with  time  rather  than  at  a discrete  set  of  times. 

The  functions  c,  (C)  xnd  c,  , (t)  are  costs,  hence  positive 

A ■ . n jl.1 

functions.  Tide  fact  will  be  used  in  what  follows. 

The  total  maintnnance/fcilure  cost  of  the  system  S as  a function 
of  time  will  be 


(6.8) 
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where  we  hav*  written  dF^ (t)  in  place  of  (t)dt . As  we  have  already 
remarked  above,  the  main  objective  of  the  Reliability-Center ed  Maintenance 
Program  is  co  <uinimize  the  value  of  C(t)  for  each  time  t,  given  the 
history  of  the  system  for  times  t*  < t. 


5,±&  The  problem  of  minimizing  C(t)  is  still  too  complicated  to  admit 
a mathematical  solution  even  if  all  the  quantities  involved  were  precisely 
known.  Nevertheless,  a simple  observation  provides  the  key  for  implemen- 
tation of  a systematic  iterative  procedure  which  acts  to  reduce  C(t) 
it  the  latter  is  not  already  a local  minimum. 

Since  A is  a , maximally  isdepenvient  partition.,  it  will  ndt  be 
possible  to  reduce  • C(t)  by  passage  to  a refinement  M fbr  which  the 
Hurvival  dxstributijris  F^Ct),  < M , are  independent.  This  means  that 
C(c)  need  uot  be  the  globally  minimal  mainterance /failure  coat  for  the 
system  even  though  it  may  be  minimal  for  the  collection  of  all  maximally 
independent,  partitions.  Furthermore,  the  local  minimum  (subject  to  the 
constraint  of  maximal  independence  ot  the  partition)  depends  on  the  choice 
of  partition,  Thorr  is  no  guarantee  that  minimization  of  C(t)  for  tne 
given  maximally  independent  partition  A is  the  same  as  minimization  of 
C(t)  over  the  class  cf  all  maximally  independent  partitions  of  the  system. 
Thus  we  must  again  conclude  that  the  minimum  which  will  be  attained  by 
the  procedure  about  to  be  described,  hence  also  the  minimum  attained  by, 
the  Reliability -Centered  Maintenance  Program,  is  not  necessarily  global. 
Nevertheless,  experience  svggests  that  the  minimum  achieved  in  the 
application  of  the  Program  to  commercial  airline  operations  m>y  oe  close 
to  the  global  minimum  and,  in  any  event,  partial  minimization  of  C(t) 
by  application  of  the  policies  introduced  below  leads  to  significant 
reductions'  in  the  value  of  C(t)  in  practical  situations. 

Returning  now  to  eq.  (6.8),  observe  that  y^(tj  X'Q,  fbin  C(t) 
is  a finite  sum  (over  the  elements  of  the  partition  A)  of  incegrale 


J r\(zy  P^(*r)dr 


whose  integtanos  are  products  of  non-negative  functions.  Consequently, 
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the  total  coat  C (t)  through  time  t will  be  reduced1 if  one  or  more 
of  the  following  three  possibilities  occurs: 

I.  For  some  X«  A there  is  a maintenance  policy  which 

replaces  the  failure  cost  density  c, (t)  by  a failure 

f A 

cost  density  c.(t)*  such  that 

r V /. 

c^(t)*  - c^(t)  for  all  t and 
f f 

c^(t)*  < c^(t)  . for  t in  some  open  interval. 

II.  For  some  X e A there  is  a maintenance  policy  which 

replaces  the  maintenance  cost  function  c?1  . (t)  by  a 

_ A » 1 


maintenance  cost  function 

<.!<*>* 

such 

* » — 

that 

m 

cx> 

i(t)*  < cj(l(t> 

for  all  t 

and 

m 

cx. 

(t)*  < c“  (t) 

X n y 1 

for  t in 

some 

open  interval. 

III.  For  some  X e A there  is  a maintenance  policy  which  replaces 
che  product  T^(t)  P^(t)  by  a product  Y*(t)  P*(t)  such 
that 

Y*(t)  p*(t)  i Yx(t)  Px(t)  for  all  t and 

Y*(t)  p*(t)  < Y^(t)  pX^t^  ^or  41  in  some  °Pen  interval, 
and  neither  nor  II  is  applicable. 

Maintenance  policies  of  Type  I occur  when  an  item  is  redesigned  to 
Incorporate  redundancy  or  other  fail-safe  design  methods  which  act  to 
rejoice  the  cost  of  failure  of  the  initial  item  without  necessarily 
affecting  its  probability  of  failure.  This  type  of  policy  change  tends 
to  apply  to  modifications  of  equipment  design  rather  than  to  modifications 
of  operational  maintenance  procedures. 

Type  II  policies  are  indifferent  to  survival  distributions  and 
therefore  are  really  Independent  of  the  properties  of  the  equipment  being 
maintained.  They  are  principally  managerial  or  organizational  policies 
concerned  with  matters  such  as  scheduling  of  periodic  maintenance  tasks, 
location  of  depots,  provision  of  replacement  parts  in  adequate  number  to 
reduce  downtime  revenue  loss  while  avoiding  costs  associated  with 
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excessive  replacement  parts  stock,  and  so  forth.  Optimal  Type  11  policies 
can  be  difficult  to  identify  and  implement,  but  their  nature  and  importance 
have  always  been  understood  by  managers  and  cost  accountants.  Nevertheless , 
the  large  costs  of  critical  failures  cannot,  in  typical  situations,  be 
counterbalanced  by  efficiencies  from  Type  II  decisions,  that ‘is,  wi-h6ut 
modification  of  the  survival  distribution  or  the  cost  of  failure. 

The  most  significant  opportunities  fqr  the  introduction  of  mainte- 
nance policies  which  reduce  C(t)  are  of  Type  III,  which  can  be  further 
categorized  into  three  subtypes.  Using  the  notations  and  constraints 
given  in  III,  they  can  be  expressed  as  follows: 


IIIA. 

Y*(t)  - Yx<t) 

and  p*(t) 

> »x(t> 

for 

all  t; 

IIIB. 

yJ«S>  <.Tx(t) 

and  p*(t) 

i cx(t> 

for 

all  t; 

IIIC. 

Neither  of  the 

above . 

For  either  of  the  first  two  conditions  there  will  be  some  open 
interval  on  which  strict  inequality  obtains  because  of  the  condition 

Y*(t)  p*(t)  < Yx<t) 

in  III.  It  is  possible  that  there  will  be  some  intervals  where 
P*(t)  < Px(t)  and  others  where  p*(t)  y Px(t)  compatible  with  III; 
these  cases  are  subsumed  under  IIIC. 

In  circumstances  where  IIIA  is  applicable  the  reduction  in  the 
probability  of  failure  density  may  result  in  an  increase  in  maintenance 
costs.  Nevertheless,  if  a failure  of  the  item  in  question  is  critical 
with  a corresponding  large  cost  of  failure  density  Cx(t),  then  the 
product  ^x(t)  Px(fc)  will  generally  be  reduced,  often  by  a substantial 
amount.  Maintenance  policies  of  this  type  correspond  to  situations 
where  a judicious  additional  investment  in  an  appropriate  maintenance 
action  results  in  a significant  decrease  in  the  failure  density  for  items 
which  are  associated  with  large  failure  costs.  Essential  for  the  effective 
introduction  of  Type  IIIA  maintenance  policies  is  an  evaluation  of 
failure  modes  and  the  consequences  of  failure.  Based  upon  such  infor- 
mation, maintenance  policies  of  Type  IIIA  can  act  to  reduce  C(t)  by 
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introducing  a redefinition  of  an  unsatisfactory  condition  (cp.  the  dis- 
cussion in,  Chapter  3)—  that  is,  a failure— in  order  to  convert  functional 
failures  (especially  critical  failures)  into  non-functional  failures, 
this  conversion  will  normally  be  accomplished  by  introducing  instrumen- 
tation or  various  inspection  and  monitoring  activities,  each  of  which 
adds  to  Maintenance  cost,  but  the  increase  in  maintenance  cost  is  offset 
by  the  reduction  in  the  expected  cost  of  failure. 

Policies  of  type  IIIB  are  particularly  effective  when  applied  to 
non-significant  items  (cp.  the  discussions  of  significant  Items  and 
Condition  Monitoring  maintenance  in  Chapter  8).  They  decrease  Y^(t) 
while  possibly  increasing  the  failure  density  p^(t)  in  a manner  which 
decreases  the  product  of  these  two  functions.  If  the  failures  of  an 
item  are  not  significant,  then  there  generally  is  no  'compelling  reason 
to  implement  either  a Hard  Time  or  an  On  Condition  maintenance  policy. 

By  placing  such  items  in  the  Condition  Monitoring  category.  Type.  IIIB 
cost  reductions  can  be  obtained.  In  effect,  this  means  that  the  failure 
cost  density  c*(t)  reduces  to  the  cost  of  replacing  the  failed  item. 

If  this  is  less  than  the  cost  of  maintenance  over  the  lifetime  of  the 
item,  then  the  cost  density  product  is  reduced  by  implementing  this 
policy.  For  example,  a maintenance  policy  which  periodically  dismantled 
and  renewed  seat  recliners  would  be  relatively  costly  compared  with  the 
imputed  cost  of  a recliner  failure.  Consequently,  although  the  failure 
density  might  be  increased  thereby,  a revised  policy  which  merely 
monitored  the  condition  of  the  recliners  by  establishing  a mechanism  to 
report  users’  complaints  would  certainly  reduce  C^(t)  and* thus  C(t) 
Itself.  It  is  of  particular  importance  to  seek  those  elements  of  the 
partitipn  for  which  scheduled  maintenance  policies  result  in  greater 
values  of  C^tt)  than  would  Condition  Monitoring,  (i.e.,  surveillance) 
policies  either  because  maintenance  processes  do  not  reduce  the  failure 
density  (e.g. , if  the  associated  hazard  rate  is  non- increasing)  or 
because  the  cost  of  reduction  by  maintenance  is  greater  than  the  imputed 
added  cost  of  .failure  through  lack  of  scheduled  maintenance.  The 
Decision  Diagram  technique  of  the  Reliability-Centered  Maintenance 
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Program  provides  an  explicit  means  for  identification  of  partition 
elements  to  which  Type  III  B policies  can  be  applied. 

; r . , i . 

It  may  happen  that  application  of  policies  I - III  decreases  C(t) 
but  that  the  new  cost  function  is  not  minimal.  Less  expensive  con- 
versions of  functional  to  non-functional  failures,  longer  inspection 
intervals  and  Hard  Time  renewal  intervals  may  be  recognized  as  beneficial 
at  some  subsequent  time.  New  information  may  become  available  as  a 
result  of  experience  or  testing  or  theoretical  advances.  Equipment  will 
generally  evolve,  and  constituent  items  will  be  replaced  by  others  with 
different  (but  not  always  more  favorable)  reliability  characteristics. 

Each  of  these  occurrences  may  provide  a cost-effective  reason  to  apply 
the  policies  I - III  again,  thereby  bringing  the  maintenance /failure  cost  **■ 
function  closer  to  a local  minimum.  The  history  of  the  iterated 
application  of  maintenance  policies  of  Types  I to  III  will  typically,' 
when  conceived  as  one  grand  maintenance  policy,  be  of  Type  IIIC:  neither 
the  cost  densities  nor  the  failure  densities  exhibit  monotone  decreasing  '' 
behavior  as  time  increases,  but  the  policy  nevertheless  achieves  an 
overall  cost  reduction  at  each  stage^of  the  iteration. 

I 

A simple  geometric  interpretation  of  this  procedure  can  be  readily 
visualized.  der  the  maintenance/failure  cost  function  C(t)  as  a 

function  of  f .rious  parameters  which  determine  a maintenance  policy. 
These  would  include  Hard  Time  replacement  Intervals,  the  reliability 
distributions  of  the  parts,  and  so  forth.  As  a function  of  these  variables 
and  for  each  time  t the  maintenance/failure  cost  function  determines  a 
hypersurface  in  a multidimensional  euclidean  space.  This  surface  has 
the  property  that  the  total  cost  function  is  positive  for  each  time  t. 

The  maintenance  policy  designer  seeks  a curve  on  this  surface  which 
depends  on  t such  that  for  each  fixed  value  of  t,  the  curve  passes 
through  the  minimum  point  on  the  hypersurface  corresponding  to  ithat  time. 

In  more  picturesque  language,  the  desired  maintenance  policy  is  represented 
by  a curve  which  passes  through  the  lowest  points  of  the  deepest  valley 
of  the  cost  hypersurface . The  policies  I - III  are  valley-seeking;  with 
each  application,  they  direct  the  curve  further  downward  into  a valley. 
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7.  INFORMATION  AND  MAINTENANCE  PROGRAMS 

u,  Critical  failures  of  large-scale  complex  systems  are  generally 
extremely  costly;  consequently,  a maintenance  policy  which  attempts  to 
minimize  total  costs  must  also  attempt  to  minimize  the  number  of  critical 
failures.  Thus,  an  effective  maintenance  program  will  of  necessity  be 
reliability-centered.  The  more  effective  the  program  is,  the  fewer 
critical  failures  will  occur,  and  correspondingly  less  information  about 
operational  failures  will  be  available  to  the  maintenance  policy  designer. 
It  is  in  this  sense  that  the  objective  of  the  maintenance  policy  designer 
can  be  thought  of  as  an  attempt  to  minimize  information,  and  that  the  most 
successful  policy  yields  no  information  whatsoever  about  critical  failures 
because  it  precludes  their  occurrence.  That  the  optimal  policy  must  be 
designed  in  the  absence  of  critical  failure  information,  utilizing  only 
the  results  of  component  tests  and  prior  experience  with  related  but 
different  complex  systems,  is  an  apparently  paradoxical  situation.  More- 
over, the  applicability  of  statistical  theories  of  reliability  to  the 
very  small  populations  of  large-scale  complex  systems  typically  encountered 
in  practice  is  questionable  and  calls  for  some  discussion.  Each  of  these 
distinct  viewpoints  leads  to  the  conclusion  that  maintenance  policy  design 
is  necessarily  conducted  with  extremely  limited  information  of  dubious 
reproducibility,  and  we  must  consider  why  it  is  nevertheless  possible, 
and  how  it  can  be  done.  The  following  two  subsections  take  up  these  ques- 
tions in  turn. 

Recall  the  geometric  interpretation  of  the  Reliability-Centered 
Maintenance  Program  given  at  the  end  of  Section  6.  For  each  fixed  time 
t the  maintenanec/failure  cost  function  can  be  considered  as  a function 
of  the  various  parameters  whose  selection  specifies  a maintenance  policy. 
This  function  defines  a hypersurface  in  some  multi-dimensional  euclidean 
space.  Since  costs  are  necessarily  non-negative,  the  cost  function  will 
attain  its  minimum  value  at  some  point(s)  of  the  surface;  we  may  say  that 
' such  a point  is  the  lowest  point  of  a valley  on  the  surface.  The 
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Reliabillty-Cent  Maintenance  Program  is  designed  to  seek  the  lowest 
point  in  some  valley  on  the  surface,  for  each  time  t. 

Denote  the  surface  associated  with  time  t by  Sfc.  If  the  varia- 
tion of  t is  identified  with  a variable  point  on  a line,  then  the 
individual  surfaces  can  be  stacked  one  next  to  other  to  form  a set 

S “ { iO  <.  t < »}  i (7.1) 

S need  not  be  a smooth  surface  itself  because  discontinuous  modifica- 
tions of  equipment  may  introduce  discontinuities  in  S as  t increases. 
For  the  sake  of  discussion,  let  us  assume  that  S itself  is  a surface 
(of  dimension  1 greater  than  the  dimension  of  each  S^).  The  optimal 
maintenance  policy  at  time  t is  one  which  corresponds  to  a local  mini- 
mum, i.e.,  a lowest  valley  point,  on  S . Combining  these  as  t varies, 

<e  ** 

one  obtains  a lowest  valley  point  on  for  each  t.  These  points 

need  not  trace  out  a curve  on  S because  changes  of  maintenance  policy 
can  correspond  to  a "jump"  from  the  lowest  point  in  one  valley  on  Sfc 
to  the  lowest  point  in  some  other  valley  on  S . Nevertheless,  it  is 

L 

impossible  to  implement  more  than  a finite  number  of  policy  changes  in  a 
finite  time  interval,  so  that  an  optimal  Reliability-Centered  Maintenance 
Program  corresponds  to  a finite  number  of  curves  lying  on  S,  each  of  the 
form  t -*f(t),  with  f(t)  a point  in  St  which  is  the  lowest  point  in 
some  valley  on  St.  Thus,  as  t increases,  the  point  f(t)  which  cor- 
responds to  a solution  of  the  maintenance  problem  traces  out  a curve 
which  runs  along  the  floor  of  a valley  in  S possibly  jumping,  from 
time  to  time,  from  one  valley  to  another. 

The  mathematical  problem  which  corresponds  to  this  description  con- 
sists of  locating  the  minima  of  as  t varies.  If  the  equation 

which  defines  S is  known,  then  this  problem  can  in  principle  be  solved 
by  applying  the  methods  of  advanced  calculus.  In  practice,  were  the 
defining  equation  known,  the  number  of  independent  variables  entering 
into  it  would  be  so  great  as  to  preclude  an  explicit  analytical  solution 
of  the  problem.  In  any  event,  for  reasons  already  cited  and  discussed 


in  detail  throughout  [6],  the  defining  equation  can  not  be  known  because 
the  available  information  is  insufficient. 

The  defining  equation  of  S contains  all  possible  relevant  informa- 
tion about  the  consequences  of  all  conceivable  maintenance  policies. 

This  is  surely  much  more  information  than  is  actually  needed  either  to 
specify  a locally  minimal  (valley  floor)  curve  or  even  to  locate  one. 
^Dac^ed,  if  p denotes  a point  of  S and  if  any  downward  direction  at 
p\.  is  known,  i.e.,  a direction  for  which  the  directional  derivative  at 
\o  is  negative,  then  a small  displacement  from  p along  the  surface  in 
tiuit  downward  direction  leads  to  a nearby  point,  say  q,  for  which  the 

I \ . 

malatenatice/failure  cost  is  strictly  less  than  the  maintenance/failure 

t 

cost  corresponding  to  p.  Observe  that  this  procedure  merely  requires 
information  about  the  cost  benefits  of  policies  which  differ  little  from 
the  policy  corresponding  to  p:  we  may  say  that  this  procedure  only 
requires  Information  about  policies  in  a small  neighborhood  of  the  policy 
p.  Such  information  is  the  most  likely  to  be  available,  or  estimable,  in 
practice.  Moreover,  this  procedure  does  not  even  require  full  information 
about  all  policies  in  a small  neighborhood  of  p;  it  suffices  to  know  one 
direction  which  leads  to  cost  reduction.  In  this  sense,  we  may  construe 
the  Reliability-Centered  Maintenance  Program  as  a well-defined  procedure 
for  identifying  directions  on  S which  tend  downward,  i.e.,  reduce 
maintenance/failure  cost. 

The  rapidity  with  which  the  floor  of  the  valley  is  reached  l>y  this 
process  depends  on  the  size  of  the  step  taken  in  the  downward  direction. 

If  the  step  size  is  smaller  than  necessary,  it  will  take  more  stqps  to 
reach  the  valley  floor,  so  that  greater  than  necessary  maintenante/f allure 
costs  will  be  borne:  unnecessary  maintenance  activities  will  have  been 
supported,  avoidable  failures  will  have  been  experienced.  If  the  step 
size  is  too 'large,  then  the  maintenance  policy  may  leap  from  one!  valley 
wall  to  another,  unable  to  detect  the  floor.;  and  producing  an  osjcillatlng 
policy  which  can,  in  unfavorable  circumstances,  produce  successively 
greater  maintenance/ failure  costs  and  ultimately  oscillate  among  legal 
maxima.  The  choice  of  step  size  is  critical,  as  Jias  been  implicitly 
recognized  in  the  conservative  federal  guidelines  concerning  extension  of 
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Hard  Time  replacement  intervals  in  commercial  airline  maintenance  policies,  i 
It  is  clearly  preferable  to  select  a step  size  smaller  than  optimal 
instead  of  one  larger  than  optimal  because  the  consequences  of  the  former  j 
vary  continuously  with  step  size  whereas  small  changes  in  the  latter  can 
produce  large  and  unanticipated  cost  increases.  It  must  also  be  recog- 
nized that  the  size  of  the  optimal  step  depends  on  its  location  on  the  i 

‘ i 

surface  S,  or,  to  put  it  more  picturesquely,  it  depends  on  how  "wrinkled" 
the  surface  is  in  the  neighborhood  of  the  point  from  which  the  step  is 
taken.  If  the  surface  slopes  gradually  and  gently  downward  toward  the 
valley  floor,  then  a larger  step  will  be  admissible  than  is  the  case  when 
the  step-off  point  lies  at  the  top  of  a steep  cliff  overlooking  the  valley. 
Determination  of  the  optimal  step  size  is  a more  difficult  problem  than 
is  determination  of  a direction  in  which  the  step  should  be  taken  because  ' 
the  former  implicitly  requires  some  estimate  of  the  magnitude  of  the 
directional  derivative  whereas  the  latter  merely  utilizes  the  sign  of  that 
derivative.  Suppose  that  there  is  reason  to  believe  that  the  absolute 
value  of  the  directional  derivative  is  bounded  by  a known  constant  on  the 
entire  surface  S.  This  information  enables  one  to  establish  a maximum 
step  size  such  that  maintenance/ failure  cost  increases  as  the  result  of 
over-stepping  are  held  below  some  prearranged  value.  Hypotheses  about 
the  maximum  absolute  value  of  directional  derivatives  can  be  based  upon 
prior  experience;  relative  to  a maximum  step  size  determined  this  way, 
the  assertion  of  some  reliability  engineers  that  "there  are  no  cliffs"  in 
hazard  functions  and  other  reliability  measures  is  given  a precise  mathe- 
matical interpretation. 

In  summary,  although  the  maintenance  policy  designer  has  little 
information  at  his  disposal  regarding  the  precise  nature  of  the  mainte- 
nance/failure cost  surface,  creation  of  an  iterative  minimum-seeking 
policy  only  requires  enough  information  to  identify  downward-tending 
directions  in  the  neighborhood  of  an  existing  policy,  and  to  establish 
an  upper  bound  for  step  size  in  order  to  avoid  overstepping. 

u It  is  generally  impossible  to  adequately  test  most  large-scale 
complex  systems  because  so  few  replicas  are  built  and  the  time  needed  to 
test  one  system  at  the  desired  confidence  level  often  approximates  the 
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expected  lifetime  of  the  sy&tpm  ty  i prior  to  obsolescence  Simple 
systems  are  also  subject  to  the  letter  problem  if  high  reliability  is 
demanded  and  technology  is  rapidly  varying,  For  instance,  MIL-STD-690A, 
"iLfe  Test  Sampling  Procedures  for  Established  Levels  of  Reliability  and1 
Confidence  in  Electronic  Part  Specif ications,*'  proposed  in  1965,  required 
aero  test  failures  in  230  million  part  burs  to  meet  a standard  bf  0.001% 
failures  pet  thousand  hours  at  the  90%  confidence  level.  Testing  as 
many  as  16,000  parts  simultaneously  would  require  A. 5 years  of  testing 
24  ’ ,’urs  per  day.  But  recent  electronic  technology  has  been  consistently 
undergoing  major  revolutions  at  intervals  of  approximately  5 years.  We 
must  conclude  that  a product  which  has  been  adequately  tested  according 
to  conventional  standards  may  be  obsolete  by  the  time  it  satisfies  the 
testing  criteria.  Thus,  complexity  of  equipment  and  high  performance 
requirements  conspire  to  eliminate  the  possibility  of  observing  the 
purvivci  cuaracteristics  of  system  replicas"  In  sufficient  quantity  for 
statistical  analyvi.’,  of  sample  variation  to  be  a valuable  guide. 

Although  J f is  common  to  viev  statistics  as  an  analytical  arsenal 
for  the  dcscrlptior  of  observed  variations  in  large  samples  of  homologous 
ii'cjr.  subjected  to  similar  environmental  stresse  , there  is  another,  more 
profound,  view  introduced  into  statistical  mech^jxcs  by  J.  Willard  Gibbs. 
Prior  to  Gibbs,  the  application  of  statistical  methods  to  the  study ^>f 

hysiesi]  reality  was  beset  with  philosophical  problems  arising  from  the 

r 

! irrefutable  observation  that  there  is  ibut  one  universe,  not  as  ’e  of 

i 

universes  the  variation  of  whose  properties  statistics  would  describe. 

It  was  Gibbs  who  conceived  the  fruitful  notion  of  a virtual  ensemble  of 

i \ 

potential  universes  upon  which  statistical  analysis  acted  to  select  one 
- the  one  that  exists  — as  a kind  of  solution  toa  variational  problem, 
the  problem  of  maximizing  expectation.  In  this  way  statistics  is  applied 
as  a cardinal  ’•rinciple  in  our  model  of  nature,  on  somewhat  the  same 

■ I 

footing  as  Lewton's  Laws,  to  determine  which  among  the  conceivable 
universes  shall  occur;  it  is  not  a descriptive  tool  to  provide  a measure 
of  observed  variation.  Elevated  to  a principle,  statistics  nevertheless 
cannot  determine  the  e.oursej  cf  nature  withouc  additional  information, 

Just  as  application  of  Newton' 3 Laws  reejuirus  knowledge,  of  the  appropriate 
force  function.  I 


The  statistics  of  traditional  reliability  theory  has  few  points  of 
contact  with  the  Gibbsian  interpretation;  it  is  woven  together  with 
product  sampling  and  age  exploration.  When  these  are  not  possible,  when 
the  system  is  complex,  unrep^Licated,  and  rapidly  becomes  obsolete,  then 
application  of  statistics  as  a means  for  the  analysis  of  variation  must 
yield  to  the  Gibbsian  role  of  statistics  as  a selection  principle. 

These  remarks  neither  solve  any  problem  of  reliability  nor  yield 
profound  insights.  But  they  perhaps  suggest  a philosophical  foundation 
upon  which  an  acceptable  theory  of  the  application  of  statistics  to  the 

reliability  of  complex  systems  can  be  developed. 

. ! 

7.4  Recalling  the  ideas  and  notations  of  Section  7.2,  we  recognize  that 
the  step  size  used  in  implementing  the  Reliability-Centered  Maintenance 
Program  depends  on  the  policy  selected  and  also  on  the  time  of  selection 
of  the  policy.  A point  on  the  maintenance/failure  cost  surface  cor- 

responding to  time  t is  specified  by  the  policy  parameters,  which  will 
be  collectively  denoted  by  p(t),  and  the  corresponding  cost,  say 
C(t,  p(t)).  Thus  the  corresponding  point/on  has  coordinates 

(p(t),  C(t,p(t));  and,  iwhen  it  is  considered  as  a point  <j>n  the  full  policy 
surface  S,  its  coordinates  are  (t,p(t),  C(tjp(t)))  with  itime  jas  an  i 
explicit  variable.  Selection  of  a step  is'  the  same  thing  as  selection  of 

V 

a pair  of  points  on  S,  say  (t'.p'Ct'),  C(t',p'(t)))  and 
(t",p"(tM),  J(t",p"(t"))) . The  time  variable  plays  its  usual  distinctive 
role  since  it  is  subjec?-  to  unicursal  variation:  time  always  increases. 
This  implies  that  of  two  applications  of  the  minimizing  maintenance 
policies  (I)  — (111)  of  Section  6,  one  will  always  be  antecedent  to  the 
other;  we  can  suppose,  without  loss  of  generality,  that  t'<  t".  It  may 
happen  the  policy  p remains  unchanged  from  t'  until  t":  that  is,  a 
review  of  policy  may  not  bring  forth  sufficient  reasons  to  implement  a 
policy  change.  The  process  of  review,  and  the  process  of  implementation 
of  a policy  change,  may  be  costly,  which  is  an  inducement  to  extend  the 
interval  t"— t'  between  successive  reviews  or  changes  as  much  as  possible. 
Counterbalancing  this  argument  is  the  possibility  that  a review  will  lead 
to  a substantial  cost  decrease,  i.e.,  that  there  will  be  sufficient 
information  to  enable  the  size  and  direction  of  the  next  step  in  the 
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iterative  minimization  procedure  to  be  determined.  ?/!■  reasons 

the  problem  of  determining  the  step  size  in  the  time  variable,  that  is, 
of  determining  the  interval  t"— t between  successive  applications  of 
the  policies  (I)  — (III)  to  the  system,  assumes  a particularly  signifi- 
cant role.  An  intensively  studied  special  case  of  this  problem  is  con- 
cerned with  the  extension  of  Hard  Time  replacement  intervals  for  equip- 
ment as  experience  accumulates.  I 

Determination  of  the  optimal  intervals  for  application  of  the 
Reliability-Centered  Maintenance  Program  policies  appears  to  be  a particu- 
larly difficult  problem,  depending  as  it  does  on  both  the  conversion  of 
operating  experience  into  information  about  the  survival  distributions 
of  the  elements  of  the  parti.  j.on  of  the  system,  and  on  the  effect  this 
information  should  or  would  have  on  those  who  bear  the  responsibiltiy  for 
making  policy  changes  such  as  increasing  replacement  o^  inspection  inter- 
vals, We  have  already  noted  that  larger  than  optimal  step  sizes  can 
lead  to  wild  oscillations  in  maintenance/ failure  costs  and  to  an  increas- 
ing number  of  critical  failures,  whereas  smaller  than  optimal  step  sizes, 
which  can  also  be  called  conservative  estimates,  merely  reduce  the.  rate 
of  approach  to  the  optimal  policy.  This  is  a persuasive  argument  for  a 
conservative  implementation  of  a maintenance  program.  Excessive  con- 
servatism, jhowever,  is  often  too  costly  and  retards  the  evolution  o£ 
i \ i 

related  systems.  • It  is  therefore  worthwhile  to  try  to  formulate  the 

dt  '.ision  process  in?a  manner  irhich  makes  it  subject  to  analysis. 

/ 

\ • 

One  way  to  formalize  this  problem  of  interval  determination  is  based 
upon  its  connection  with  information  theory.  Let  * " * ,t:n  be  a 

sequence  of  inspection  or  replacement  times  for  samples  of  a type  of 
item.  Let  R(t)  be  the  observed  survival  distribution  and  U the 
universe  of  sample  items.  If  £sU  is  an  item,  it  will  age  and  finally 
faijl  at  some  time  t(£).  Let 

w(ti)  * {£  * tjL_1  £ t(£)  < t±>  , i»l,...,n,  with  t0**0  (7.2) 

that  is,  let  u)(t. ) denote  the  set  of  items  which  fail  before  the  1 
inspection  but  not  before  the  (1-1)  inspection.  The  sets 
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{o)(t  ):  i=l,2, . . . ,n}  ■ constitute  a partition  of  U.  The  probability 

I , 

that  Cci^t^)  is  R(t^  - R(’tj).  The  information  associated  with  the 
partition  is  (cp.[?.]  [ll]) 


W)  = -L  (RCti_i)  - RCt±)j  loge(R(ti_1)  " R(ti))  • (7. 


3) 


Passing  to  continuous  variables,  this  corresponds  to  (cp. [ll]) 


Kfl) 


f. 


P ( t) log  p(t)dt 


(7.4) 


Note  that  the  information  defined  by  eo.  (7.4)  depends  on  the  coordinate 
system;  it  is  not  independent  of  transformations  of  the  time  variable, 
among  which  selection  of  zero  time  is  included.  In  particular,  the 
information  corresponding  to  a continuous  survival  probability  density 
may  be  negative.  Information  differences  do  have  absolute  meoning, 
independent  of  coordinate  transformations. 

It  can  be  shown  that,  among  all  differentiable  survival  probability 
densitites  Vhicb  have  the  same  mean  time  before  failure  T, 


£ 


T - I tp(t)dt 
r0 


T7.>>) 


the  exponential  survival  distribution,  for  which 


p(t)  = - exp  ( - t/T) 


(7.6)  • ; 


maximizes  the  information  eq.  (7.4).  A simple  calculation  shows  that  in  ,, 
this  case 


1=1+  loge  T. 


(7.7) 
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The  information  corresponding  to  the  exponent  tel  distribution  and 
Inspection  intervals  of  equal  duration  can  be  easily  calculated.  Let  the, 
inspection  times  be 


iqT  * 1*0 |1| • • * 


(7.8) 


where  T denotes  the  mean  time  before  failure  and  q is  a positive 
constant.  The  inspection  intervals  have  common  duration  t^+^  t^  * qT, 
and  the  survival  distribution  is  given  by  eq.  (7.6).  From  the  formulae 


CO 


i*l 


and 


Z*’ 


i-1 


(1-x)4 


V- 


W 


each  valid  for  -1  <•  x < 1,  we  find,  from  eq.  (7.3): 


I - I(q> 


OO 

(1-e  q)  ^ qie  iq  - log  (1-e  q)  e iq 
i*l  e i-1 


. i 

S i 


* — 9 e ^iog  (i  - e q)  , 

eq-l  : e : . , 

As  the  inspection  interval  tends  to  zero,  the  discrete  formula  eq.  (7.3) 
does  not  pass  over  to  formula  eq.  (7.4)  corresponding  to  the  ihf orwatidh 
associated  with  an  absolutely  continuous  distribution,  so  one  should  - 

not  expect  that  Tim  I(q)  tfill  reducfe  to  eq.  (7.7);  instead  we  find 

4 + o • ' ■ .n- 
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1(0)  - lim  x(q)  - » ; (7.10) 

q + ° 

periodic  Inspection  with  zero  interinspection  interval  produces  infinite 
Information  for  the  exponential  distribution.  At  the  other  extreme,  if 
there  are  no  inspections,  which  is  equivalent  to  the  condition  that  q 
is  infinite,  then 

I(“)  - lim  I(q)  - 0 ; (7.11) 

q -*■  °o  i 

the  information  gain  is  zero.  These  calculations  agree  with  our' intuitive 
assessment  of  the  situation. 

I(q)  decreases  from  infinity  to  zero  as  the  interinspection  interval 
increases  from  zero  (continuous  inspection)  to  infinity  (no  inspection). 
For  inspection  intervals  equal  to  T we  find 


1(1) 


- A loge(i  - A)  - 0.750+ 


(7.12) 


Our  objective  is  to  determine  inspection  intervals  so  that  there  is 
some  desired  relationship  between  them  ard  the  cprresponding  measure  of 
lnforttiat/  on. 

^ inspection  tiitia  * ,l!'h-l  ®^ven  an£^-et  it  he 

req^tn!1^  to  determine  t . ^Moreover,  consider  further  inspection  times 

4 n 

t I ^ , k«l,2,...,  corresponding  to  equal  inspection  intervals 


tn+k+l  Si+k 


qT , k"l,2 , . . . 


(7.13) 


we  will  later  let  q approach  infinity,  and  will  follow  from  the 
calculations  previously  given  that  the  information  corresponding  to  the 
latter  intervals  will  be  zero.  The  Information  corresponding  to  the 
partition  Induced  by  1{t^,t2,...}  is 


I - -£  lo,e 


(7.14) 


. iJ 
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where  we  have  abbreviated 


Pi  " R(t±-l)  “ R(ti) 


(7.15) 


If  the  desired  relationship  among  the  intervals  is  that  each  inspec-  \ 
tion  interval  produces  the  same  amount  of  information,  then  the  condition 


Pil08ePi  " const-  “ Pil08ePi 

for  all  i.  If  the  survival  distribution  is  exponential  and  the  first 
inspection  time  is  t^  then  t2  is  determined  by  the  equation 


(l  - e tl/T)loge(l  - e l/T)  - 

(e"VT . e‘VT)log  (e-vT . e-^/T) 


(7.17) 


This  is  equivalent  to  an  equation  of  the  form 


(1  - e"X)loge(l  - e“x) 


const. 


where  x«*(t0-t, )/T,  and  can  be  solved  by  numerical  methods. 
* i i 


(7.18) 


The  left  s;(.de  of  eq.  (7.17)  is  known  from  observations  obtained 
through  time  t^.  By.  monitoring  R(t)  throughout  the  interval  'Vfcij  <,  t, 
one  can  always  calculate  vhen  the  incremental  information  satisfies, 
eq.  (7. 16) , which  establishes  and  the  successive  intervals. 

In  general,  if  there  16  infant  mortality,  then  t^-t^  > t^;  it  will 
take  longer  to  acquire  additional  information  about  the  survival  distribu- 
tion after  the  epoch  of  infant  mortality  has  been  outlived.  Similarly, 
should  a wear-out  period  exist  for  aged  items,  inspection  intervals 
established  by  the  principle  of  equal  information  will  be  cooperatively 
shortened  to  compensate  for  the  increased  hazard  rate.  -However,  the 
reduction  nay  have  a negligible  practical  effect  because  there  may  be  too 
few  items  surviving  until  advanced  ages  to  significantly  affect  total 
fleet  maintenance  cost.  This  is  In  accord  with  experience. 
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The  policy  outlined  above  responds  to  failure;  cp.  Chap.tev  6.  j 

As  items  in  the  initial  inventory  fail,  they  may  be  renewed  and 
; returned  to  service,  or  replaced  by  new  items  of  the  same  type.  Additions 
to  the  operational  inventory  of  items  may  also  be  made  at  various  times 
and  in  varying  quantities.  As  a consequence,  the  oldest  items  in  opera* 
tion  are  likely  to  constitute  only  a small  fraction  of  the  "fleet"  even 
if  the  failure  rate  is  low. 

Additions  to  the  operational  inventory  and  renewal  of  failed  equip- 
ment creates  complex,  unpredictable,  and  continually  varying  age  distribu- 
tions. Figure  7.1  illustrates  an  age  distribution,  with  each  renewed 
item  returned  to  service  treated  as  distinct  from  all  others,  including  | 

its  pre-failure  form,  t is  a measure  of  operational  time  and  tc  ron  j 

denotes  chronological  time.  The  total  operating  time  until  failure  of 
item  i is  denoted  by  t^.  In  this  figure,  item  No.  4 might  be  a j 

renewed  version  of  the  failed  item  No.  3;  No.  5 is  a non-initial  ' 

i 

acquisition  which  has  not  failed  during  the  span  of  chronological  time  j 

I 

displayed  in  the  figure. 


Figure  7.1.  An  Age  Distribution  - Chronological  Display 


If  the  information  provided  in  the  figure  is  displayed'  i«;  tern*  at 
operating  time  t,  then  it  can  be  arranged  as  shown  in  Figure  7.2. 


Item  No.  Operational  Tima  t 


Figure  7.2.  An  Age  Distribution  - Operational  Display 


The  survival  distribution  R(t)  can  be  estimated;  from  this  data 
for  all  t not  greater  than  the  operational  age  of  the  oldest  item  in 
service.  If  failed  items  are  renewed  and  returned  to  service,  the  sample 
size  for  estimation' of  R(t)  for  small  t will  generally  be  significantly 
larger  than  the  total  inventory  of  items  since  given  renewed  items  share 
multiple  operating  histories.  Since  an  estimate  of  R(t)  is  given  by  the 
fraction  of  items  surviving  until  t,  as  experience  accumulates,  renewed 
items  are  returned  to  the  operating  inventory,  and  new  items  are  acquired, 
the  estimates  of  R(t)  for  small  ,t  can  be  repeatedly  updated.  As  date 
accumulates,  the  estimates  of  R(t)  will  stabilize;  thus*  replenishment  ’ 


p 
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end  expansion  of  the  operating  inventory  only  net  to  refine  the  estimate 
of  R(t)  end  reduce  its  Variance..  Since  the  failure  information  measure 
I of  eq.  (7.3)  is  conpletely  determined  by  R(t)  and  the  Inspection 
Intervals,  it  follows  that  the  estimate  of  I is  independent  of  replen- 
ishment and  expansion  of  the  inventory  except  thet  as  chronological  time 
passes » the  estimates  of  I for  small  operating  times  become  increasingly 
reliable. 


' 
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GLOSSARY  OF  NOTATIONS  AND  TERMINOLOGY 

Notations 


* Set  membership  symbol.  For  'x«  S'  read  'x  is  an  element 

of  S'  or  'x  belongs  to  S'. 

U>  U Set  union  symbol.  SUT  is  the  set  whose  elements  belong  to 

at  least  one  of  S and  T. 

H,  Pi  Set  intersection  symbol.  SflT  is  the  set  whose  elements 

belong  to  both  S and  T. 

— Set  difference  symbol.  S — T is  the  set  whose  elements 

belong  to  S but  not  to  T. 

C Set  inclusion  symbol.  SCt  signifies  that  each  element  of 

S is  also  an  element  of  T. 

j)  Empty  set. 

Y^(t)  Maintenance /failure  cost  density  of  element  X of  a parti- 

tion A of  system  S with  respect  to  the  failure  distribu- 
tion eq.  (6.7). 

S(x)  Dirac  delta  (generalized)  function;  eq.  (2.27). 

n(t)  Hazard  rate,  also  called  failure  rate;  eq.  (3.14). 

X Typical  element  of  the  partition  A of  system  S;  eq.  (6.2). 

A Partition  of  a system  S;  eq.  (6.2)  and  Figure  6.2. 

p Typical  element  of  the  partition  M of  system  S,  where 

M is  a refinement  of  A;  Figure  6.3. 

M Partition  of  a system  S which  refines  another  partition 

A;  Figure  6.3. 

p(t)  Failure  probability  density;  eqs.  (3.8),  (3.9). 


91 


px<t> 


w(t) 

ft 

•»<*) 

c'(t) 

c(t) 

Gx(t) 

F(t) 

FX<fc> 

F(u(t)) 

Kq) 

T(n> 

p(x) 


Failure  probability  density  of  element  X of  partition  A 
of  system  S;  eq.  (6.6). 

Event  in  a measurable  space;  eq.  (2.1). 

Set  of  items  which  have  failed  prior  to  time  t;  eqs.  (3.1), 
(7.2). 

Collection  of  events  in  a measurable  space;  eq.  (2.1). 

Cost  density  with  respect  to  time,  corresponding  to  cost 
function  C^(t)  for  partition  element  X;  eq.  (6.4). 

Imputed  cost  density  of  failure  of  partition  element  X 
per  unit  hazard  rate  of  X;  eq.  (6.6). 

Cost  density  of  maintenance  of  partition  element  X cor- 
responding to  inspection  time  t^;  eq.  (6.5). 

•r 

Indicator  function  of  event  u;  eq.  (2.7). 

Maintenance/f allure  cost  function  for  the  complex  system 
S;  eq.  (6.8). 

Maintenance/failure  cost  function  for  the  element  X of 
partition  A of  the  complex  system  S;  56.3. 

Distribution  function  for  failure  prior  to  time  t;  eq.  (3.5) 

Distribution  function  for  failure  of  partition  element  X 
prior  to  time  t;  §6.3. 

Probability  of  failure  prior  to  time  t;  eqs.  (3.3),  (3.5). 

Information  corresponding  to  an  exponential  survival 
distribution  and  inspection  intervals  of  duration  qT  with 
T the  mean  time  before  failure;  eq.  (7.9). 

Information  corresponding  to  discrete  partition  ft;  eq.  (7.3) 

Probability  density  function  corresponding  to  the  probability 
distribution  P ■ P^  of  the  random  variable  f . The  random 
variable  is  usually  suppressed  from  the  notation;  eq.  (2.18'). 
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Distribution  function  of  a fixed  random  variable  (not 
indicated  by  the  notation),  relative  to  the  probability 
measure  P;  eqs.  (2.12),  (2.13). 

Distribution  function  of  random  variable  f relative  to 
the  probability  measure  P;  eq.  (2.12). 

Absolutely  continuous  distribution  function;  eq.  (2.16). 

Discrete  distribution  function;  eq.  (2.16), 

Singular  distribution  function;  eq.  (2.16). 

Probability  measure;  eq.  (2.1). 

Conditional  probability  of  event  o»2  given  event 
eq.  (2.35). 

Distribution  function  for  survival  until  time  t,  also 
known  as  the  reliability;  eq.  (3.6). 

Distribution  function  for  survival  of  system  S until  time 
t;  eq.  (6.3). 

Distribution  function  for  survival  of  partition  element 
X of  S until  time  t;  eq,  (6.3). 

Probability  of  survival  until  time  t;  eq.  (3.2). 

Set  of  real  numbers. 

Maintenance/ failure  cost  surface;  eq.  (7.1). 
Maintenance/feilure  cost  surface  for  time  t;  eq.  (7.1). 

Set  of  items  which  constitute  a complex  system;  eq.  (6.2). 
Mean  time  before  failure;  Figure  (3.3),  eq.  (7.5). 


5 


Terminology 

Bathtub  curve  - Typical  shape  of  e.  hazard  function  graph;  Figure  5,1 
Bayes'  Principle  of  Inverse  Probability  - Figure  2.6,  eq.  (2.39). 
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Condition  Monitoring  - One  of  the  three  primary  maintenance  processes, 
consisting  of  no  scheduled  preventive  maintenance.  Condition 
monitoring  depends  on  the  surveillance  and  analysis  program  for 
data  collection  and  data  analysis,  upon  which  judgements  cSn  be 
made  relative  to  maintaining  items;  see  [6]. 

Conditional  probability  - eq.  (2.35). 

Conditional  probability  of  failure  - eq.  (3.13). 

Distribution  function  - eq.  (2.13).. 

Event  - eq.  (2.1). 

Exponential  survival  distribution  - 14.1 

Failure  probability  density  - eq.  (3.8). 

Failure  rate  - Same  as  hazard  function;  eq.  (3.14). 

Gamma  survival  distribution  - S4.5. 

Hard  Time  - One  of  the  three  primary  maintenance  processes r requiring 
fixed-limit  removal  for  overhaul  or  time  limits;  see  [7]. 

Hazard  function  - Same  as  failure  rate;  eq.  (3.14). 

Information  - A measure  of  the  organization  of  elements  of  a set  associ- 
ated with  some  partition.  Modem  Information  Theory  was  developed 
by  Claude  Shannon  in  connection  with  communication  systems  during 
the  1940s.  Soon  thereafter  its  relation  to  older  ideas  in  otatis- 
tical  mechanics  and  statistics  was  recognized,  and  its  fundamental 
role  throughout  the  physical  sciences  was  elaborated  in  numerous 
articles  and  books,  among  which  those  by  L.  Brillouin  and 
E.  Schroedinger  are  particularly  worthy  of  mention.  Measures  of 
information  are  now  systematically  employed  in  fields  as  diverse, 
as  linguistics  and  psychophysics,  biology  and  physics,  cowawnicw 
tion  engineering  and  library  science.  Although  originally  con- 
ceived in  the  context  of  transmission  of  sequencer  of  sytttbol' 
drawn  from. a finite  inventory  with  fixed  probabilities,  the  con- 
cept of  information  is  movr  general  and  can  be  associated  with  any 
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partition  of  a finite  set,  and  in  certain  instance*  with  infinite 
sets  OB  well.  See  [6],  especially  eqa.  (7.3)  snd  (7.4),  and 
references  [2],  [ll]. 

Independent  random  variables  - §2.4,  eq.  (2.33). 

Lebesque-Stieltjes  integral  - §2.3,  eq.  (2.22)  . See  also  [T2]  for  a moie 
general  and  coscprehensive  development. 

Likelihood  ratio  - eq.  (2.39). 

Lognormal  survival  distribution  - §4.4. 

Maximum- likelihood  method  of  estimation  - eq.  (2.40). 

Normal  survival  distribution  - §4.2. 

On  Condition  - One  of  the  thrae  primary  maintenance  processes,  requiring 
repetitive  inspections  or  tests  to  determine  reduced  resistance 
to  failure  for  specific  failure  modes. 

Probability  density  function  - eq.  (2.18). 

Probabil  ity  of  failure  - §3.1. 

Probability  of  survival  " eq  (3.2). 

Ranaum  variable  - Paragraphs  following  *q.  (23)  and  Figure  2.2. 

Ueibull  survival  distribution  - §4.3. 


